Enabling Context-Aware Natural Language Processing: From Dense Vector Representations to Contextual Features



Journal Title

Journal ISSN

Volume Title



Natural Language Processing (NLP) is an interdisciplinary approach arose from the intersection of computer science, artificial intelligence, psychology, and linguistics during the early 1940s. Nowadays, NLP approaches are powered by machine learning (ML), which needs a suitable set of features to be effective. In that direction, word embeddings are dense vector representations that are state-of-the-art features in ML-powered NLP applications. Despite the success of word embeddings, researchers have identified important drawbacks. For instance, dense representations limit the interpretability of the results produced by these systems and often neglect the context around constituents in the sentences. This dissertation advances towards context-aware features that remedy the problem of lack of interpretability and context consideration inherent to word embeddings. This is done by first applying state-of-the-art embedding and topic modelling techniques in a non-conventional field for such techniques: economics, specifically to statements issued by the Federal Reserve, while highlighting the problems that word embeddings have. Next, the study uses embedding and ensemble learning approaches in phishing and fake reviews detection to explore if the deceptive intent can be modeled using word embeddings, hence, stepping forward towards the consideration of context in the features. In addition, an evaluation of the performance of content-derived linguistic cues in the task of fake reviews detection is presented. Finally, this dissertation concludes with the presentation of ContextMiner, a novel NLP framework to automatically capture contextual features for extracting meaningful context-aware phrases, i.e., Contextual Features, from cybersecurity texts. The dissertation shows the potential of ContextMiner for Named Entity Recognition (NER), information retrieval, and knowledge systematization using security texts. The study also presents a detailed case study where we perform document clustering using a novel document representation comprised of contextual features. Contextual features can enhance the usability of ML-powered NLP systems by maintaining their explainability, as demonstrated in the results.



natural language processing, machine learning, interpretability