AbstractThe focus of this research is Aspect Based Sentiment Analysis (ABSA) in two different domains of text-based reviews. ABSA is one of the main tasks in Natural Language Processing (NLP). An ABSA system receives a set of documents as input and groups them into sentiment polarity classes based on the opinion and emotion expressed in them. Not only it quantities sentiment polarity, but also the aspect of a product or service that the sentiment refers to. Consumer feedback or reviews for products of services usually contain multiple express of opinion and sentiment related to different features/attributes of products and services, that we call "aspects". ABSA aims in estimating the sentiment polarity for each aspect discussed in a document. For instance, ABSA would recognise positive or negative opinion expressed for each feature of a laptop. For example, the sentence: “the battery capacity is not too high, screen resolution is good enough, the CPU speed is good and the internal memory is great” expresses positive sentiment for the ‘CPU’, the ‘memory’ and the ’screen’ aspects and negative sentiment about the ‘battery’.
The objectives of this research are divided into the following parts: (a) detecting aspects from a document and extracting them in two specific domains, (b) classifying opinion about these aspects into sentiment polarity classes and finally (c) proposing a dual-domain system able to classify reviews form two different domains, concurrently.
As a first step, we focused on proposing a new method for addressing the shortcomings of aspect extraction systems on text-based reviews. Extracting all available aspects from text of a specific domain without pre-defining the aspects manually and hand-crafting data is a major challenge in this task. To address the shortcomings of current systems, such as the pre-processing required, and in order to improve performance, we proposed a neural network-based model that uses two word embeddings, one general and specific word embedding as the other one. The experiments show that applying a pre-trained general word embedding and a trained word embedding on a specific domain can be helpful to detect and extract aspects in two various domains. We evaluated the system using two benchmark datasets from SemEval (Semantic Evaluation) challenges and compared our result with state-of-the-art systems for addressing the same task on the same datasets. The experimental results indicate that our system achieves the performance (F1 score) of 73.81% for restaurant dataset and 78.26% for laptop dataset which are comparable or better performance that the state-of-the-art.
As a second step, we proposed a novel approach for aspect-based sentiment classification of previously extracted aspects based on deep learning, and in particular on a convolutional neural network (CNN). To improve classification performance, we trained a domain-aware word embedding and in succession fine-tune it to become task-aware. Instead of using a large manually annotated training set (gold) and engineering the features manually, we investigated the use of silver annotated data for fine-tuning. Silver annotated data are automatically generated by classifying reviews as positive and negative, according to positive and negative keywords that they contain.
As a third step, we extended the proposed approach for aspect-based sentiment classification to a dual-domain model. This enhanced model uses the approach, mentioned in the above second step, and extends it to combine the embedding spaces of different domains, resulting in a comprehensive model. Training this dual-domain model is less resource and time intensive that training a separate model for each domain. The experimental results indicate that our system achieves the performance (F1 score) of 78.83% for restaurant dataset and 74.96% for laptop dataset which are comparable or better performance that the state-of-the-art.
The aspect-based sentiment classification approaches designed and developed in the second and third step, above, have been evaluated on gold annotated data and the effect of different data sizes on the effectiveness of methods has been investigated. For our experiments, we used existing sentiment classification benchmarks, such as the datasets in ABSA SemEval tasks in 2014 and 2016, and a dataset of Amazon reviews for laptops and Yelp dataset for restaurants. The experimental results show that the proposed methods perform similarly to more complex models that have significantly larger requirements in memory and computation time.
|Date of Award||19 May 2022|
|Supervisor||YANNIS KORKONTZELOS (Director of Studies), HARI MOHAN PANDEY (Supervisor) & Nik Bessis (Supervisor)|
- Sentiment Analysis
- Aspect-based sentiment analysis
- Sentiment classification
- Neural Networks
- Aspect extraction
- Natural Language Processing