Context-aware document analysis


Document management is a vital task in any enterprise. In many domains, a massive number of documents written in natural language are available although they are not well configured, and the relations between them are not determined. Organising this amount of information manually is not feasible in many domains. A method that can assist us in extracting the similarities between documents is the first step toward an autonomous framework for managing documents. 
In NLP, distributed representation (or word embeddings) for text has been widely studied. In this approach, a vector represents natural language elements, including word, phrase, paragraph, or even whole document. The vector representation captures the semantics of NL elements. 
On the other hand, each document has metadata that relates the document to other entities, including people (e.g., authors) and documents (e.g., cited papers). Modelling documents by taking to account both content and context of documents is essential.
In this project, we seek to address the issue of modelling documents based on their content and context.


Programming skill in Python

Solid Background in Machine Learning or Natural Language Processing 


Experience on developing solution for an open research question

Skills in conducting research in ML and NLP


Natural language processing

Updated:  1 June 2019/Responsible Officer:  Dean, CECS/Page Contact:  CECS Marketing