Sentimental semantic classification using decomposed LSTM over big data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Text classification is considered as one of the primary task in many Natural Language Processing (NLP) applications. Traditional text classifiers often depends on human interaction for feature design, such as building dictionaries or knowledge databases. The range of text classification research goes from designing the best features to choosing the best possible machine learning classifiers. In industry fields, such as marketing and product management, they are already leveraging the process of text analyzing and extracting information from textual data. For example text classification of content on the website using tags helps Google crawl the website easily or a faster emergency response system can be made by classifying panic conversation on social media. In text classification a challenge is the feature engineering over the input to extract characteristics and find their associations to the output classes. With the evolve of deep neural network, the task of feature extraction is mostly taken care with dense layers of neural network where each layer extracts a representation of the input data.
However, success of deep learning is associated to our ability to train large neural networks for large datasets. Long training times for deep neural networks (DNNs) affect research in new DNN by slowing the development of DNNs.Hence, faster training enables increasingly larger models to be trained on large datasets in feasible amounts of time. Another challenge in text classification problems for sentimental analysis tasks is traditional feature engineering of data presented as embedding representation are not sufficient in specific domain when there is a latent sentiment inside the sentence.
To solved these challenges, we propose LSTM Decomposing Method (LDM) to reduce training time of DNN(s) and present Sent2Vec, a new representation of input text that includes the sentiment. LSTM Decomposing Method (LDM) is based on disintegrating the internal unit of a Recurrent Neural Network (RNN) into sub-units. Unlike previous similar researches, our approach does not need an active re-training during the parameter reduction phase. Additionally, we introduce Sent2Vec, a sentimental representation for text classification. Sent2Vec magnified the sentiment representation of a sentence and produces a vector of numbers where positions replicated the original embedding characteristics yet with a taste of the sentiment of the sentence. In this report, we utilized our techniques over multiple dataset including Amazon dataset which has 3,000,000 records for training and 650,000 for testing. We also explain two use cases of NLP techniques in chatbot implementation and Facebook comment analysis.
Embargo status: Restricted until January 2022. To request access, click on the PDF link to the left.