Abstract
Sentiment Analysis (SA) of text reviews is an emerging concern in Natural Language Processing (NLP). It is a broadly active method for analyzing and extracting opinions from text using individual or ensemble learning techniques. This field has unquestionable potential in the digital world and social media platforms. Therefore, we present a systematic survey that organizes and describes the current scenario of the SA and provides a structured overview of proposed approaches from traditional to advance. This work also discusses the SA-related challenges, feature engineering techniques, benchmark datasets, popular publication platforms, and best algorithms to advance the automatic SA. Furthermore, a comparative study has been conducted to assess the performance of bagging and boosting-based ensemble techniques for social network SA. Bagging and Boosting are two major approaches of ensemble learning that contain various ensemble algorithms to classify sentiment polarity. Recent studies recommend that ensemble learning techniques have the potential of applicability for sentiment classification. This analytical study examines the bagging and boosting-based ensemble techniques on four benchmark datasets to provide extensive knowledge regarding ensemble techniques for SA. The efficiency and accuracy of these techniques have been measured in terms of TPR, FPR, Weighted F-Score, Weighted Precision, Weighted Recall, Accuracy, ROC-AUC curve, and Run-Time. Moreover, comparative results reveal that bagging-based ensemble techniques outperformed boosting-based techniques for text classification. This extensive review aims to present benchmark information regarding social network SA that will be helpful for future research in this field.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
With the incremental growth of information technology and social platforms, user-generated information can easily be posted online, and this information contains people's sentiments and emotions toward a particular issue. While government, companies, and individuals are interested in retrieving the sentiments behind that reviews. Miserably, with the massive amount of data, it is challenging to polarize these comments and reviews. Where human experts are overpriced for labeling these reviews manually. Accordingly, SA is gaining a lot of popularity in research topics (Chen and Yang 2011). It is a broadly active method for analyzing and extracting opinions from text using individual or ensemble learning techniques. This field has unquestionable potential in the digital world and social media platforms. The vast content generated on the web is unstructured, which can be processed by the SA and converted into meaningful information. SA is the subset of NLP that combines computational linguistics, a rule-based approach, and machine learning for extracting the public's opinion from content provided on social platforms, including text, images, and videos. According to the requirement of a particular application, the problem of sentiment classification is primarily handled at aspect, sentence, and document levels. Aspect-based SA is known as the feature-level SA in which multiple features are extracted from the text reviews. Aspect-based SA provides a deep study of reviews and extracts the context of reviewers for a particular domain (Thet et al. 2010; García-Pablos et al. 2018). The aspect-level approach mainly depends on the syntactic features of the text reviews (Che et al. 2015). Sentence-based SA approach works on finding the polarity for a particular sentence. Here, the various words are linked together to form a sentence and extract the polarity from that sentence N-Grams technique is used, which separates the words into pair of one, two, or maybe three. Sometimes N-Gram technique is failed to find the relationship between these words. Therefore, dependency tree and typed dependency have been introduced to address the word separation problem in text classification (Meena and Prabhakar 2007). In the sentence-level classification, each sentence is considered a separate unit and assumes that every sentence produces only one opinion: positive, negative, or neutral (Jagtap and Pawar 2013). Each document is considered a single unit in the document-based approach, and a single opinion is assigned for the whole document. The Bag-of-words approach is very popular and provides more accuracy in handling complexity in document-level SA (Bhatia et al. 2015). Most sentence-level applications try to achieve good accuracy in the whole document (Zhang et al. 2009). SA and opinion mining are two popular fields that help to calculate opinioned information from online social platforms. These are commonly reciprocal to present a similar meaning. However, some researchers are used them for handling slightly different problems. SA is used to detect the sentiment from reviews as neutral, negative, or positive, and opinion mining is used to analyze a text's subjectivity (Tsytsarau and Palpanas 2012). Previous research employed machine learning and heuristic-based methods very frequently. Heuristic-based methods mainly depend on semantic features and linguistic characters, whereas machine learning-based algorithms are classified into unsupervised, supervised, and ensemble learning.
Several articles have been published related to SA using different techniques, which generates a need for a deep study to summarize the trends and aspects related to SA. One comparative study and one detailed survey were also presented a few years back by Xia et al. (2011) and Giachanou and Crestani (2016) in 2011 and 2016, respectively. Xia et al. (2011) provided a comparative study of ensemble-based techniques for SA but did not cover the advanced ensemble approach of this field. Giachanou and Crestani (2016) presented an in-depth survey related to Twitter SA and summarized the previously proposed approaches of SA in Twitter. However, this survey did not implement any latest techniques for comparative discussion and did not explore the latest updates in this field. Here, we provide a detailed SA survey and present all the recent facts and trends related to this field. This study investigated the research work from 1996 until 2022 utilizing online repositories and tried to cover all the essential aspects related to SA, which will provide deeper information to upcoming researchers in a single manuscript. Extensive experiments have also been conducted on different domains to provide the best ensemble approach for the sentiment classification task—this analytical study was mainly conducted for sentence-level SA using ensemble machine-learning techniques. Furthermore, experimented ensembles are categorized into two major categories; bagging and boosting. Accordingly, eight ensemble learners were implemented, where five belonged to boosting approach and three from the bagging approach. Figure 1 presents the summarized taxonomy of our social network SA survey.
Multiple learners learn together in an ensemble approach to get more accurate and efficient results than individual learners. Ensemble methods have been used in NLP applications and are proven better than a single method (Zhang et al. 2009). The Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models with averaging method generate better results than individual ones (Minaee et al. 2019). Although governments, businesses, and individuals are always interested in calculating the polarity and sentiment from the reviews, no consistent conclusion is available to prove which methodology is best for this process. Therefore to find conclusive results, this study compares eight ensemble techniques on four popular datasets to investigate the performance of ensemble models for SA. The main objective of this study is to explore the latest research on sentiment classification with a comparative analysis of ensemble-based techniques. Therefore, we explained five research queries.
-
RQ1 What are the different approaches, publishing platforms, and benchmark datasets used by researchers for the SA.
To discover the most popular approach and dataset used in the field of SA. This would be helpful for the researchers to understand the current scenario related to this area.
-
RQ2 What are the major challenges facing the researchers during sentiment calculation from text reviews.
Discuss the challenges in the field of NLP with their proposed solutions.
-
RQ3 What are the distinct feature engineering techniques for selecting the essential features from text reviews.
To explain the various feature engineering techniques for dimensionality reduction of text datasets. Thus, many critical research papers have been collected from different publishing sites to map popular feature engineering techniques for text datasets.
-
RQ4 What are the researchers' emotion theories to detect the emotions from the social content, including text, images, and videos.
To identify the common emotions that are present in prestigious theorist emotion sets. It would provide the best emotion set to future researchers for opinion extraction from the social content, including text, images, and videos.
-
RQ5 Which is the best ensemble technique for sentiment classification and future opportunities of SA.
To discover the best ensemble technique this provides the highest results in terms of all standard measures. Hence, various experiments were conducted on different domains to select the best technique of text classification. It would be helpful for SA-related applications. Future opportunities related to SA have been discussed.
The further sections of this study are categorized as follows: Sect. 2 presents the extensive literature survey related to the SA. Section 3 elaborates on the all-important aspects of SA. Section 4 describes the methodology used for the comparative study. Section 5 presents the comparative results and analysis. Section 6 discussed the future opportunities of SA. Finally, Sect. 7 generates the study's conclusion and addresses some needful issues for future research.
2 Literature survey
SA is extensively used to extract people's opinions, emotions, and sentiments toward a particular brand, business, place, or product. Various techniques and approaches are also introduced to classify the sentiments as the demand for SA increases. After analyzing the vast literature on sentiment classification, we have concluded that SA can use five significant approaches. Figure 2 presents the classification of all the major approaches used by researchers for sentiment classification.
First, the lexicon-based approach uses a manually or automatically-generated list of various positive, negative or neutral polarity terms for sentiment classification. The lexicon approach computes the semantic orientation of phrases and words in sentences and documents to reveal the sentiments. Usually, the lexicon-based approach uses adjectives to indicate the semantic adjustments (Taboada et al. 2011). Second, the machine learning approach is a widely adopted technique for SA. Most researchers preferred a machine learning-based approach for sentiment classification due to their fast execution and reliable results. Machine learning provides various single learners, namely Naïve Bayes (NB), K-Neighbors (KN), Linear Regression (LR), Support Vector Machine (SVM), and so forth. Third, the graph-based approach selects the nodes and vertices based on the feature (reviews and tweets) available in input materials. Various graph-based models such as Enterprise Graphs, Hyper-graph, Hashtag Graphs, N-Gram Graph, and Co-Occurrence Graph are available for effective SA process (Krishnakumari and Akshaya 2019). Fourth, the ensemble approach combines multiple weak learners to form a powerful learner. Various ensemble learners, namely Random-Forest, Extra-Tree, Meta-Estimator, Ada-Boost, Gradient-Boosting, Light-GBM, Cat-Boost, and Extreme Gradient-Boost, are available to make the sentiment process more effective than the lexicon approach and single machine learning approach. Fifth, the most potent Hybrid approach that enhances the capability of sentiment classification model with the integration of machine learning and lexicon-based approach or with the combination of multiple machine learning algorithms. A hybrid approach is a novel idea that the researchers present to build a more prosperous and robust model for solving a particular problem. The researcher performs various experiments with discriminant techniques on specific data and tries to create a more effective model than a single and ensemble model. For example, linguistic dictionary and SVM were combined to build a hybrid model for political tweets sentiment classification that acquired 93% accuracy for sentiment classification, which is significant enough and beneficial for politicians to make strategies for future elections (Nandi and Agrawal 2016). Here, we categorized all the previous research into two parts: Sentiment Analysis (SA)—which studies the subjective information in the text, and Sentiment Classification (SC)—which identifies the opinions from the text and assigns a particular label to them.
2.1 Lexicon-based approach
Phrases and opinions implement lexicon-based approaches without prior knowledge of labels. Here, collective phrases are treated as an opinion lexicon along with negative and positive words. Opinion lexicons determine the orientation of the terms available in the text dataset. The lexicon-based approach is categorized into two parts; the Dictionary-Based approach- judges the sentiment based on phrases available in lexicons, and the Corpus-Based approach—extracts the context present in the text. Table 1 reports the list of lexicon-based research from 2011 to 2022.
2.2 Machine learning-based approach
Machine learning is the most promising approach for SA. Usually, machine learning-based SA provides a high accuracy score than the lexicon-based approach. It offers various feature engineering techniques that extract the critical features from the dataset and improve the efficiency of SA. Different supervised and unsupervised algorithms are available for sentiment classification. The supervised approach works on labeled datasets and uses a mapping function to map the input labels with output labels. In contrast, unsupervised learning learns the pattern from unlabeled data using clusters. Table 2 presents a few popular pieces of research related to machine learning-based SA from 2011 to 2022.
2.3 Graph-based approach
A graph-based approach connects interrelated words in text reviews to calculate the sentiment and opinion of people where vertices and nodes conform to features available in reviews. Various graph-based methods and algorithms have been applied in the last decades to solve the problem of SA. Table 3 visualizes the multiple pieces of research that have been done in the area of SA using graph-based methods.
2.4 Ensemble approach
Ensemble learning is a process of combining several learners strategically to form an intelligent model. It improves the classification problem by reducing poor and unfortunate selection. It has the capability and knowledge of various learners, which increases the accuracy of a classification and decreases errors in prediction. Decision-based on diverse learning makes ensemble learning more accurate and trustworthy than single learning. Table 4 presents the research work done in SA using an ensemble approach from 2011 to 2022.
2.5 Hybrid approach
The hybrid approach utilizes the capability of various approaches such as rule-based, lexicon-based, machine learning, or deep learning-based. It enhances the efficiency of the SA model with optimum results. It is an idea that generates in a researcher's mind to develop the best approach for a particular task. Hybrid learning is categorized into semi-supervised learning, multi-instance learning, and self-supervised learning. A semi-supervised learning trains with very few predefined labels and classifies a large amount of unlabeled text. Multi-instance learning does not contain individual labels; instead, it receives labeled bags, and each bag has various instances, which explicitly treats the problems with incomplete knowledge of training examples. Self-supervised learning generates labels by itself and utilizes supervised algorithms to solve unsupervised problems. Hybrid models show significantly more improvement in classification than other methods. Table 5 presents work related to hybrid SA.
2.6 Extensive literature analysis
This section presents a deep analysis of literary work that has been done in the field of SA. Various graphs and tables have been used to discuss algorithms, datasets, approaches, and the most popular platforms related to SA. It has been employed in numerous real-life applications. Therefore researchers take more insight into it. Hence, this section focuses on various essential points that are required for further research in this area.
2.6.1 Growth in publications of SA
This section shows the growth in the number of publications of SA. As shown in Fig. 3, the number of publications related to SA was very few in starting years (2010, 2011, and 2013). As the demand for social platforms has increased, publications associated with SA have also emerged since 2014. In 2016, 2017, and 2019, numerous researchers have been proposed good research related to SA using machine learning, ensemble learning, and hybrid techniques.
2.6.2 Publication platform for SA
The total number of 92 documents from 2010 to 2022. We found 48 different journals for SA publications. The publications frequently occurred more than one time in our collection are reported in Table 6. Various conferences have also been scheduled for SA publication. The journals "Elsevier" and "Springer" are two more common venues for SA publications. Where Elsevier, Springer, and ACM are three popular publishers that are chosen by researchers for SA-related authentic research. Additionally, it has been seen that several platforms are open for SA-related research.
2.6.3 Popular datasets for SA
A benchmark dataset plays a vital role in sound research. Figure 4 presents benchmark datasets used by researchers for SA. The researchers have used several resources, namely movie reviews, product reviews, Facebook posts, and tweets for SA. It has been seen in the graph that researchers more frequently use product reviews for their experiments. Secondly, Twitter gained more popularity among researchers for SA-related experiments. Few researchers also generate their datasets for sentiment classification. Whereas the researchers also consider movie reviews, medical reviews, and hotel reviews for SA-related experiments.
2.6.4 More favorable techniques for SA
This section provides knowledge about more frequently implemented methods for SA-related problems. Additionally, we explored more popular techniques related to the lexicon-based, machine learning-based, ensemble-based, graph-based, and hybrid-based approaches for SA applications. Table 7 presents the frequently used machine learning algorithms by researchers for the SA process, where SVM is in the topmost position in the list. NB was also persistently used by the researchers, but SVM produced the most noticeable results for sentiment classification.
Table 8 shows the frequency of ensemble-based techniques used for SA. It is observed that bagging and boosting are more common techniques researchers use for ensemble sentiment learning. The concept of majority voting has also frequently been implemented by researchers in different combinations of single learners.
Table 9 presents the graph-based techniques for SA. There are many variations in choosing graph-based methods for SA, but word-graph and co-occurrence graph were used by two researchers in N set = 10.
A hybrid SA is very much in demand. So, we also surveyed various papers related to hybrid SA and categorized the hybrid work into five significant categories presented in Table 10. We found that in most of the hybrid work, researchers applied the combination of the lexicon approach and machine learning approach as it has been applied seven times in N set = 19. Whereas machine learning has been individually used five times, and a combination of machine learning and rule-based approaches has been used four times in previous work (N set = 19). The combination of machine learning with genetic and deep learning was found to be very rare.
3 Important aspects of SA
SA has been an exciting field of study since the 1990s; there are further various sub-fields for research. Merriam-Webster defined sentiment as a thought, judgment, or attitude that arises from feeling. It is an idea or opinion developed by emotions. This section presents the various essential aspects of SA.
3.1 SA challenges
SA is an emerging field, but it has various challenges, making it process-critical and decreasing the efficiency of related models. Although researchers are working to solve these issues using discriminant techniques, there is still a lack of accuracy. These challenges generate obstacles to extracting the correct meaning of sentiments and classifying the correct polarity. Common challenges of SA are mostly related to the language used in online social networks. Additionally, the words that regularly pronounce around us influence the words applied on online platforms. It is also noticeable that language used on social media is more malleable than formal words, including formal, informal, and personal communication language. Overcoming the mess of languages requires powerful natural NLP and linguistic skills. Table 11 presents the utmost challenges related to SA and proposed solutions.
3.2 SA feature engineering
The number of N features increases the domain dimensionality of the datasets. Feature engineering is a very important step in SA applications and opinion mining. Feature selection and feature extraction should be intractable with final processing in optimal feature engineering (Kohavi and John 1997). This section provides information regarding various types of feature engineering techniques that have been previously applied for text preprocessing. Figure 5 depicts the process of feature engineering that completes in four steps: (1) Original Feature Set: This section holds the raw elements of the dataset that needed processing. (2) Adding Weights: All the calculations are performed, and weights are assigned to the selected features by normalization and scaling methods. (3) Feature Ranking: It is the process of arranging the features in specific order by the value of some scoring function. (4) Final Feature Subset: This represents the finally selected N number of features ready for the fact calculation (Liu et al. 2020).
Dimensionality reduction reduces the high dimensions of the dataset that keeps more discriminative and constructive features from the collection set. Feature engineering is categorized into two major parts (1) Feature Extraction and (2) Feature Selection. Feature extraction is a process of selecting required or essential features from the original set. Principal Component Analysis (PCA) and Latent Semantic Analysis (LSA) are the two popular techniques of feature extraction (Zareapoor and Seeja 2015). At the same time, feature selection is a process that reduces the number of variables for predictive models. Effective and efficient feature selection improves the performance of SA. The feature selection process includes missing values removal, low variance removal, highly correlated feature removal, univariate selection, and recursive elimination. Feature selection methods are categorized into two groups: filtered methods and wrapper methods (Uysala and Gunal 2014).
At the same time, feature selection is a process that reduces the number of variables for predictive models. Effective and efficient feature selection improves the performance of SA. The feature selection process includes missing values removal, low variance removal, highly correlated feature removal, univariate selection, and recursive elimination. Feature selection methods are categorized into two groups: filtered methods and wrapper methods (Uysala and Gunal 2014). Filtered methods do not depend on learning models or classification algorithms and can easily apply quickly. Chi-Squared (CHI), Mutual Information (MI), Document Frequency (DF), Gini Index (GI), Information Gain (IG), and Distinguishing Feature Selection (DFS) are the filtered feature selection methods. In contrast, wrapper methods depend on learning models and follow the rules accordingly. Tabu Search, Genetic Algorithms, and Particle Swarm Optimization (PSO) are the wrapper feature selection methods. Figure 6 presents the taxonomy of feature engineering/dimensionality reduction.
3.2.1 Feature extraction
In the SA task, the reviews and documents hold million and billion of tokens that make the text classification process more complex. Feature extraction is a dimensionality reduction method that reduces the N number of dimensions from the dataset and presents it in a more predictive and compact way (Gomez et al. 2012). The reduced set is easier to handle due to its size and contains only essential features for the process.
3.2.1.1 PCA
It is a popular technique to reduce the dimensionality of the dataset by converting the original attribute into a smaller unit. The purpose of the PCA has to derive new variables from the combinations of actual variables. PCA identifies the patterns in a dataset based on the correlation among various features.
First, the mean of each feature is calculated in [Eq. 1] (Kumar et al. 2017). The mean vector of the column vector \(\mu\) is \(N \times 1\).To treat the different attributes as on the same scale, rescaling of each coordinate has been done to get a unit variance [Eq. 2], then replace \(X\left( i \right)\) with \(X\left( i \right) \, / \, \sigma j\). After completing the preprocessing, the covariance matrix has calculated using eigenvectors with the symbol \(\sum\) (Greek letter sigma) [Eq. 3].
3.2.1.2 LSA
LSA is the latest dimensionality reduction technique and feature extraction in text classification. LSA works on the procedure of analyzing concepts, terms, and relationships between unstructured texts. It can correlate semantically related terms of latent text. LSA is used for text clustering and page retrieval system. LSA resolves the problem of words with more than one meaning and various words containing similar meanings (Zareapoor and Seeja 2015).
3.2.2 Feature selection
Feature selection, select and remove irrelevant and duplicate attributes from the dataset that do not contribute to the predictive model's performance and accuracy. Feature selection contributes to improving the model's performance, develops cost-effective predictors, and provides more simple and reliable models. Feature selection is a powerful tool to simplify or speed up the calculations of the learning model (Dasgupta et al. 2007). Feature selection is further categorized into filter method and wrapper method.
3.2.2.1 Filtered method
This method allows diverse scoring techniques to access the relevancy of features independently from learning classifiers or models. These techniques are very scalable to high dimensions datasets and provide fast and simple computations (Guyon and Elisseeff 2003). Various filter methods are available for text classification and SA.
-
CHI It is a popular statistical method of feature selection that estimates the feature independently by calculating the chi-square corresponding to the class. It analyses the dependency between the term and class. It calculates 0 for the independent relationship and 1 for the dependent relationship between term and class (Zareapoor and Seeja 2015). CHI provides the significance difference formation and provides the significance difference information between categories (McHugh 2013).
$$CHI(t,c_{i} ) = \frac{{N*(AD - BE)^{2} }}{(A + E)*(B + D)*(A + B)*(E + D)}$$(4)$$CHI_{max} (t) = max_{i} (CHI(t,c_{i} ))$$(5)The CHI [Eqs. 4] and [5] calculate the association between the features of the word and the associated class (Sharmac and Dey 2012). Here, \(A\) represents frequency when \(t\) = term and \(c_{i}\) = class co-occur, \(B\) is a count while \(t\) appears without \(c_{i}\), \(E\) means events while \(c_{i}\) appears without \(t\), \(D\) represents frequency while neither \(c_{i}\) nor \(t\) appears, and \(N\) shows entire documents of the corpus. The score of CHI will be 0 when \(t\) and \(c_{i}\) are not dependent on each other.
-
MI MI presents the association or dependence between the two random variables. MI finds the dependence between term t and class c. It describes the amount of information contained by a term for the associated class [Eq. 6] (Yang and Pedersen 1997). It is calculated as:
$$MI(t,c) = log\frac{P(t|c)}{{P(t)}}$$(6)Here, \(P\) represents the probability of term \(t\), and \(P(t|c)\) represents the probability of term \(t\) of assigned class \(c\). MI measures the much information is communicated on average from one random variable to another. \(P(t)\) and \(P(c)\) are the marginal distribution of \(t\) and \(c\) get through the marginalization process.
-
DF This threshold is the most straightforward technique to reduce the vocabulary of text classification. It can easily scale the massive corpora with the linear computational complexity of training documents. It does not recommend an extemporary approach as a principled criterion for feature selection. DF represents the number of documents in which a term appears. DF follows the assumption that infrequent terms are non-descriptive for the predictions of categories (Yang and Pedersen 1997). This method continuously removes those features whose frequency has greater or less than the predefined threshold.
-
GI It is an improved version of the attribute selection algorithm used for feature selection (Alper Kursat Uysalab 2016). It works as a split measure for selecting the most appropriate splitting attribute in the decision tree. The simple formula is utilized to calculate the GI [Eq. 7].
$$GI(t) = \sum\limits_{i = 1}^{M} {P(t|C_{i} )^{2} P(C_{i} |t)^{2} }$$(7)Where, \(P\left( {t|C_{i} } \right)\) shows the probability of term \(t\) for class \(C_{i}\), \(P\left( {C_{i} |t} \right)\) shows the probability of \(C_{i}\) presence in term \(t\). \(M\) represents the number of class labels and \(P\) shows the proportion of \(i^{th}\) class label. So, GI is the measure of anti-homogeneity hence the feature of minimum impurity is selected for the best feature split.
-
IG It is a feature selection technique that reduces the size of features by computing and ranking the value of attributes. It measures the presence and absence of information in terms of contributing accurate classification. IG provides a higher score to those terms that hold relevant information for text classification.
$$IG(t) = - \sum\limits_{i = 1}^{M} {P(C_{i} )logP(C_{i} ) + P(t)\sum\limits_{i = 1}^{M} {P(C_{i} |t)logP(C_{i} |t) + P(\mathop t\limits^{ - } )\sum\limits_{i = 1}^{M} {P(C_{i} |\mathop t\limits^{ - } )logP(C_{i} |\mathop t\limits^{ - } )} } }$$(8)It is a global feature selection metric that calculates only one score for a particular term [Eq. 8] (Alper Kursat Uysalab 2016). Where, \(M\) represents a number of classes, \(P\left( {C_{i} } \right)\) probability of class \(C_{i}\), \(P(t)\) and \(P\left( {\overline{t} } \right)\) shows probabilities of term \(t\) presence and absence, \(P\left( {C_{i} |t} \right)\) and \(P\left( {Ci|\overline{t} } \right)\) are the conditional probabilities of class \(C_{i}\).
-
DFS It is the latest feature selection method and global metric for text classification. DFS selects distinguish features from the collection of sets and eliminates ambiguous ones based on predefined criteria [Eq. 9] (Uysalc and Gunal 2012).
$${\text{DFS(t) = }}\sum\limits_{{\text{i = 1}}}^{{\text{M}}} {\frac{{{\text{P(C}}_{{\text{i}}} {\text{|t)}}}}{{{\text{P(}}\overline{{\text{t}}} {\text{|C}}_{{\text{i}}} {\text{) + P(t|}}\mathop {\text{C}}\limits^{{\text{\_}}}_{{\text{i}}} {) + 1}}}}$$(9)Where, \(M\) represents total classes, \(P\left( {C_{i} |t} \right)\) shows the conditional probability of class \(C_{i}\) in the presence of term \(t\), \(P\left( {\overline{t} |Ci} \right)\) presents the conditional probability of the absence of \(t\) in \(C_{i}\), \(P\left( {t|\overline{C}_{i} } \right)\) represents the conditional probability of \(t\) for all classes except \(C_{i}\).
3.2.2.2 Wrapper method
The wrapper method uses a specific learning rule for feature selection tasks. The calculation cost of the wrapper method is high, and processing is slow. Wrapper methods are not usually preferred in SA and text classification due to their high price and slow performance (Baccianella et al. 2013). Wrapper methods are based on optimization concepts and intuitive search. Wrapper methods are used to find better features and reduce duplicate elements using cross-validation (Inza et al. 2004).
-
Tabu Search It integrates learning techniques to evaluate only promising feature subsets. Tabu search generates better accuracy than a genetic algorithm, heuristic search algorithm, PSO, and an evolutionary search for text classification (Alper Kursat Uysald 2018).
$$Accuracy = \frac{number \, of \, well \, classified \, observations}{{total \, number \, of \, observations}}$$(10)$$features = 1 - \frac{\# S + Features}{{\# Features}}$$(11)Most of the classification calculates the accuracy, which is calculated in Tabu search [Eq. 10] (Mousin et al. 2016). After that, to get a more interpretable learning model, the selected feature should minimize [Eq. 11].
-
Genetic Algorithms (GA) GA is an optimal random search-based feature selection method that works on the propaganda of biological science mechanisms. It follows the procedure of genetic evolution in biology that starts from the initial feasible population and after that, applies crossover and mutation (Lei 2012). GA is a promising way to handle conditional optimization problems and is used immensely for feature selection.
-
PSO It is used to select the most optimal feature from the collection set that provides the most remarkable difference between metallic particle classes in terms of their dimensions. PSO offers various advantages for powerful exploration. PSO has memory, inexpensive computation capability, potential population solution, address binary and discrete data, better performance, and is unaffected by dimension problem, which makes it an optimized and promising feature selection algorithm (Sharkawy et al. 2011).
3.3 SA emotion theories
Emotion extraction and classification are essential parts of SA. So, here we introduce some types of basic emotions considered by the researchers in SA and classification. Here, we introduced a standard emotion set that is common in various research. Automatic human facial expression extraction is an emerging application of Human–Computer Interaction (HCI) and affective computing. Therefore, emotion extraction and classification became prime aspects in the research field of SA. Several researchers have been working on a distinctive set of emotions and expressions.
Gunesa et al. (2005) present automatic emotion recognition from the face and body using early fusion and late fusion approaches. Their study performed on eight prototypical expressions; disgust, fear, anger, sad, happy, surprise, happy surprise, and uncertainty. Gunesb et al. (2008) used twelve emotions: disgust, fear, sadness, happiness, anger, uncertainty, anxiety, positive surprise, negative surprise, neutral surprise, boredom, and puzzlement for facial expression and body gesture extraction. Hablani et al. (2013) evaluated binary patterns for facial recognition of a person and classified their expressions according to seven basic emotions; disgust, fear, anger, sadness, happiness, surprise, and neutrality. Chen et al. (2013) used appearance and temporal motion features for facial and body gesture recognition. They classified the emotions into ten categories: disgust, fear, anger, sadness, happiness, surprise, anxiety, boredom, puzzlement, and uncertainty. Hayat et al. (2014) presented an automatic facial recognition framework with six basic emotions: disgust, fear, happiness, anger, surprise, and sadness. Table 12 displays a few recent sets of emotions that the researchers frequently used and their findings regarding visual, motion, and sound effects. These sets of emotions will be helpful for beginners to proceed in emotion mining. Figure 7 provides a better illustration of the emotion sets used by the researchers in their facial recognition works. According to Table 11 and Fig. 7, "disgust, fear, happy, sad, anger, and surprise" are common emotions used by different researchers.
4 Methodology used for comparative analysis
This section presents the methodology used for ensemble classification of the text reviews for sentence-level sentiment classification. The ensemble approach of machine learning has been used in various applications and has produced outstanding results. Ensemble learning is also approachable in the SA task. Therefore, we have presented a comparative analysis of diverse ensemble methods that are divided into two main categories: bagging and boosting. This study compares eight popular ensemble learners (Random-Forest, Extra-Tree, Meta-Estimator (Linear SVC), Ada-Boost, Gradient-Boosting, XGB, Cat-Boost, and Light-GBM) to choose the best model for SA. The experiments have been conducted on four different domain reviews: Uber reviews, Restaurant reviews, Amazon reviews, and Food reviews. Figure 8 presents the comprehensive structure of the methodology used for comparative analysis. Further sub-sections provide detailed information regarding the comparative methodology.
4.1 Dataset collection
Dataset collection is the initial step of every research, and it plays a crucial role in authentic experiments. Four leading review (Uber, Restaurant, Amazon, and Food) resources have been chosen to verify the authenticity of the experiments. Uber reviews dataset contains 1344 customer ride reviews, the Food category reviews dataset holds 25,000 records, Amazon product and Restaurant reviews dataset holds 1000 records for the experiment. Here, both large and small size of the dataset is collected for investigating the ensemble models that can provide better comparative analysis. The experimental dataset contains positive and negative reviews where positive sentiments are denoted by 1 and negative sentiments are denoted by 0. Table 13 displays the number of positive and negative reviews contain by all the datasets.
4.2 Data preprocessing
It is required to convert raw data into a machine-understandable form. First, we organized the datasets by rectifying the spelling errors, antonyms, and missing fields. After that, basic steps such as punctuation removal, whitespace removal, URL removal, number removal, and hash-tag removal have been made to clean the reviews. These preprocessing steps are needed to get an accurate score for SA because machine learning cannot work effectively on raw and grubby datasets.
4.3 Tokenization
Tokenization is a fundamental splitting phase in SA that partition the sentence, phrase, or paragraph into single words called tokens. Here tokens can be either character or word that is individually counted. Tokenization is the building block of NLP that is enforced by the n-gram approach. N-gram is a series of n items available in the text or speech. These can be categorized into unigrams, bigrams, or trigrams [Eq. 12].
where \(X\) denotes the total number of words in the sentence \(S\), and the value of \(N\) will be 1 for unigram, 2 for bigram, and 3 for trigram. In unigram, sentences or phrases are split into the tokens of one word. In bigram, two words together are treated as a single token, and in trigram, three words together are treated as single tokens.
4.4 TF-IDF vectorization
Vectorization is the process of converting text into meaningful, informative numbers. It measures the frequency of a word in a document and generates a number accordingly. TF is calculated by the number of times an individual word occurs in a document divided by the total number of words in a document.
IDF is used to assign the weights to rare words in the documents. TF-IDF is calculated in [Eq. 13]. Where \(N\) represents the total number of documents, tfij is the total number of \(i\) in \(j\), and \(df_{i}\) is the number of documents contained by \(i\) (Term Frequency xxxx).
4.5 Ensemble techniques
Machine learning supports two types of ensemble techniques bagging and boosting. Bagging selects the random samples from the training set and trains multiple learners Parallelly. In contrast, boosting collect the samples from the output of the previous learner and trains them sequentially. This section describes all the experimented ensemble techniques implemented for comparison. These algorithms are divided into two parts bagging and boosting. Wherefrom the bagging concept, we have selected a Random-Forest (RF), Extra-Tree (ET), and Meta-Estimator (Linear SVC) (M-SVC) for the implementation of SA, and from boosting approach, Ada-Boost (AB), Cat-Boost (CB), Gradient-Boosting (GB), XG-Boost (XGB), and Light-GBM (LGBM) were implemented.
4.5.1 Bagging ensemble approach
Bagging combines homogeneous classifiers and trains them parallelly with random samples. First, multiple bootstrap samples have been created that act individually. After that, base learners are fitted on them, and finally, their outputs are aggregated. Bagging is a popular ensemble approach that helps to reduce the variance of classifiers. Table 14 illustrates the procedure of the bagging ensemble approach (Polikar 2006).
4.5.1.1 Random-forest
It is a powerful technique to handle large datasets quickly. Various applications have used it for accurate and effective results. Random-Forest constructs the multiple decision trees that classify the new instance by majority voting. Each node of the DT uses a randomly selected sample from the whole original sample set. We can say that every tree uses a different bootstrap sample, the same as the bagging concept. It follows a few steps:
Equation 14 calculates the node importance of a tree. Where \(ni_{j}\) represents the importance of node \(j\), \(w_{j}\) shows a weighted number of samples, \(C_{j}\) shows the impurity value of node \(j\), \(left(j)\) is the left node, and \(right(j)\) is the right node. Equation 15 calculates the importance of each feature on a decision tree. Where, \(fi_{i}\) represents the importance of feature \(i\). Equation 16 presents the normalization of these nodes. Finally, Eq. 17 shows the averaging method of all the trees. Where \(RFfi_{i}\) shows the importance of feature \(i\) calculated from all trees, \(normfi_{i}\) represents normalized importance of feature for \(i\) in tree \(j\) and \(T\) is the total number of trees (Random-Forest. xxxx).
4.5.1.2 Extra-tree
Highly Randomized Trees Classifier is an ensemble method that aggregates the output of multiple decision trees. It is highly similar to the random forest but only differs in DT construction in a forest. Table 15 presents the splitting process of the extremely randomized tree (Geurts et al. 2006).
Every DT of the Extra-Tree forest is formulated from the attributes of the original sample set. Then each individual node of the tree uses the random k feature of the sample, and each DT selects the best split for the creation of multiple de-correlated decision trees. Every DT calculates the entropy [Eq. 18] and information gain [Eq. 19].
where, \(c\) represents a number of labels (class) and \(p_{i}\) is the proportion of rows. Extra-Tree classifier has simple properties, explicit meanings, and easy conversion of “if–then” rules (Sharaff and Gupta 2019).
4.5.1.3 Meta-estimator (linear SVC)
Bagging ensemble meta-estimator provides an option to select own base learner for the bagging process to reduce the base estimator's variance, e.g., a decision tree. Here, we have chosen Linear-SVC instead of DT as a base classifier for the bagging process. Linear SVC finds the hyper-plane space between two classes. It provides faster execution of large datasets and minimizes squared hinge loss. First, we built several substances of Linear SVC on random subsets of the original training set. After that, it aggregates the individual classified results of all substances to form a final classification.
4.5.2 Boosting ensemble approach
Boosting is an ensemble learning approach that boosts the performance of weak learners by sequentially running on multiple subsets of the dataset. Boosting constructs a sequence of models, and each model trains by considering the ambiguity of the previous model (Freund and Schapire 1996). Most ensemble techniques have identical statistics sets for training while boosting has different statistics training sets altered by previously trained models (Drucker et al. 1994). Table 16 presents the flow of boosting ensemble approach (Torelli and Menardi 2008).
4.5.2.1 Ada-boost
It is the first boosting algorithm introduced by Freund and Schapire (1996) widely used in various applications. It boosts the performance of weak learners by converting them into stronger ones. Table 17 depicts the process of Ada-Boost learning (Bahad and Saxena 2020).
Ada-Boost can train with any machine learning algorithm but is majorly applied with decision trees as these are very short and generate only one decision for classification. In this, trained models are sequentially added with weighted training data. Ada-Boost supports the concept of adaptive boosting, where weights are assigned to every instance, but higher weights are assigned to misclassified cases. The output is calculated as [Eq. 20].
where \(f_{m}\) represents the \(m_{th}\) weak classifier and \(\theta_{m}\) is the assigned weight. It generates the weighted combination of \(M\) weak classifiers.
4.5.2.2 Gradient-boosting
It is a powerful approach to building predictive models that generate additive models by statistically fitting parameterized functions to the current pseudo-residuals at each iteration of the model. The pseudo-residual is a gradient of the loss function that has been estimated on every present step. Respectively, at every iteration, a random subsample (without replacement) from the training dataset is drawn for base learning, which improves the execution speed and approximation accuracy of gradient boosting substantially (Friedman 2002).
First, a constant model has initialized with \(F_{0}\) that fits on y-values [Eq. 21]. It begins with starting a constant model \(\gamma\) with \(\gamma \_optimal\) as an optimized problem. Second, Pseudo-residuals are calculated for each \(i^{th}\) iteration [Eq. 22]. Where, \(Fm - 1\left( x \right)\) represents the model derived by adding \(m - 1\) weighted learners and primary persistent function. The rim represents the residual for the current base learner. Third fits a base learner \(D\_\bmod ified\) on a derived subset of the training dataset [Eq. 23]. The fourth \(\gamma_{m}\) multiplier is calculated by solving an optimization problem [Eq. 24]. The fifth \(F_{m} \left( x \right)\) model has been updated and obtained a final model as \(F_{m} \left( x \right)\) [Eq. 25].
4.5.2.3 Cat-boost
It is the latest ensemble technique that can incorporate deep learning techniques and work with discriminant data types to solve a wide range of problems. Cat-Boost is made with the combination of two words, "Category" and "Boosting," where category means it can work with varieties of data such as text, image, audio, or video, and boost means that it is a variant of gradient boosting ensemble. Cat-Boost resolves the exponential expansion of the feature combination generated by the greedy method at each split. Cat-Boost first divides the dataset into random subsets, then converts the labels into numerals, and finally transforms the category features into numbers [Eq. 26].
Here, CountInclass represents a number of ones in the target for given categorical features; totalCount presents previous objects, and prior shows starting parameters (Meng et al. 2016). Table 18 presents the Cat-Boost learning process (Nguyen et al. 2018).
4.5.2.4 Extreme-gradient boost (XGB)
Tianqi Chen introduced XG-Boost to improve the performance of Gradient-Boosting. It includes a wide range of tools under the guidance of Distribute Machine Learning Community (DMLC) that can efficiently work with various interfaces. XG-Boost constructs different ensemble trees sequentially for ensemble learning and assigns weights to each value of the database, which decides the probability of getting selected for the next decision tree. The initial weight of each data value is the same, and it updates according to the further analysis of decision trees. The result obtained by the first DT helps to construct a new classifying model [Eq. 27], and this process is repeated repeatedly until the construction of the final model.
Here, \(D\) is an ensemble model of a tree which applies \(K\) additive functions [Eq. 28] to predict the output.
Here, \(F\) in [Eq. 29] is a defined space, which is a part of regression trees, and \(q\) presents the tree's structure.\(T\) represents the number of leaves of a tree, and \(f_{k}\) corresponds to the tree's structure.
[Eq. 30] minimizes to provide information about the set of functions used in the model. The difference is measured between target \(y_{i}\) and predicted \(\text{y}i\).
[Eq. 31] presents the additive training process of the model.\(f_{t}\) improves the model's accuracy by optimizing the objective, and \(g_{i}\) in [Eq. 32] is second-order statistics related to the loss function.
The constant function can also be removed for obtaining the following procedure presented by [Eq. 33]. This method is complicated in terms of depth. Hence, boosting trees generates high variance and low biased results. In contrast, random trees generate high bias and low variance in results because the model has a better ability to fit on the dataset (Bhati et al. 2020).
4.5.2.5 Light-GBM (LGBM)
It supports the Gradient-Boosting framework, which increases the efficiency of the model with light-weighted decision trees. It includes Exclusive Feature Bundling (EFB) and Gradient-based One Side Sampling (GOSS) techniques to overcome the limitation of the histogram that is primarily used by all Gradient-Boosting-based algorithms. Light-GBM is a variant of Gradient-Boosting, which inherits predictivity and resolves its scalability problem and long computational time using a leaf-wise growth scheme (Zhang et al. 2019). Light-GBM finds an approximation function to minimize the value of loss function [Eq. 34].
Then integrates the various \(T\) regression trees for approximating the final model [Eq. 35].
After that, Light-GBM trains in the form of additive approach at step \(t\) [Eq. 36]
In Light-GBM, the objective function is approximated continuously with Newton's method. The formulation is transformed in [Eq. 37] after removing the constant term in [Eq. 36].
where \(h_{t}\) and \(g_{i}\) present first and second-order gradient statistics of the loss function. Let \(I_{j}\) represents the sample set of leaf \(j\) and [Eq. 37] transformed as [Eq. 38].
For \(q\left( x \right)\) tree structure, \(w*j\) presents the optimal weight score of each leaf node and extreme value of could be formulated as [Eq. 39].
Here is the scoring function that measures the quality of the tree \(q\) structure [Eq. 40]. Finally, after adding the split objective function is as follows:
where \(IL\) and \(IR\) present the sample set of left and right nodes, respectively, Light-GBM trees grow vertically, unlike other Gradient-Boosting techniques, making Light-GBM more effective for processing the various features and large datasets.
5 Comparative results
This section presents the comparative results of eight ensemble techniques (Ada-Boost, Gradient-Boosting, XGB, Light-GBM, Cat-Boost, Random-Forest, Meta-Estimator (Linear SVC), and Extra-Tree) on four popular reviews (Uber-Reviews, Restaurant-Reviews, Amazon-Reviews, and Food-Reviews) datasets. The experiments were conducted on a PC with Intel(R) Core (TM) i5-8265U processor, 4 GB RAM, 64bit operating system, and Windows-10 using Jupyter Notebook. All the datasets are partitioned into two parts, 80% for training purposes and 20% for the testing set. The standard measures, namely TPR, FPR, accuracy, weighted precision, weighted recall, weighted f1-score, AUC-score, and run-time, were adopted to check the performance of each ensemble model. The definition of all the employed measures is initiated with a confusion matrix, as presented in Table 19.
-
Accuracy It is simply a ratio of accurate prediction to the total predicted observations [Eq. 42].
$$Accuracy = \frac{TP + TN}{{TP + FP + FN + TN}}$$(42) -
Weighted Precision It is a ratio of correctly positive predictions to the total positive predicted observations [Eq. 43].
$$\Pr ecision_{Weighted} = \frac{{\sum\limits_{i = 1}^{m} {|y_{i} |\frac{{TP_{i} }}{{TP_{i} + FP_{i} }}} }}{{\sum\limits_{i}^{m} {|y_{i} |} }}$$(43) -
Weighted Recall It is a ratio of correctly predicted positive observations to the total actual observations [Eq. 44].
$${\text{Re}} call_{Weighted} = \frac{{\sum\limits_{i = 1}^{m} {|y_{i} |\frac{{TP_{i} }}{{TP_{i} + FN_{i} }}} }}{{\sum\limits_{i}^{m} {|y_{i} |} }}$$(44) -
Weighted F1-Score It is a weighted average score of precision and recall [Eq. 45].
$$F1 - Score_{Weighted} = \frac{{\sum\limits_{i = 1}^{m} {|y_{i} |\frac{{2TP_{i} }}{{2TP_{i} + FP_{i} + FN_{i} }}} }}{{\sum\limits_{i}^{m} {|y_{i} |} }}$$(45) -
ROC-AUC It stands for the area under the Receiving Operating Characteristics Curve that measures the capability of classification technique to differentiate between the classes. A higher AUC score presents better classification, and a lower score shows inaccurate classification. The ROC-AUC curve plotted based on True Positive Rate (TPR) = TP/TP + FN on the x-axis and False Positive Rate (FPR) = FP/TN + FP on the y-axis (Bichitrananda Behera and Kumaravelan 2019).
Table 20 reported the TPR, FPR, and run-time values of eight ensemble models. Accordingly, GB obtains the highest TPR value, 117.6, for Uber reviews. ET receives the highest TPR value, 60.19, for Restaurant reviews. M-SVC gets the highest TPR value, 68.26, for Amazon reviews, and CB obtains the highest TPR value, 88.70, for Food reviews. This shows that GB, ET, M-SVC, and CB are more capable than other ensembles of identifying the actual positives correctly. M-SVC scores minimum FPR of 0.0 and 12.75 for Uber reviews and Food reviews. In comparison, GB obtains a minimum FPR of 02.06 and 03.12 for Restaurant and Amazon reviews. In addition, M-SVC provides fast execution for small datasets, as it had taken the minimum time (97 ms and 67 ms) to run for Restaurant reviews and Amazon reviews datasets. Still, for the large Food reviews dataset, ET has taken a minimum of 2350 ms for execution. Conclusively, the M-SVC approach provides the highest TPR, lower FPR, and fast performance for text classification.
Figure 9a, b, c, and d depicts the combined ROC-AUC score of experimented ensemble models for experimented datasets. It can be seen that Ada-Boost obtains the highest AUC score of 73 and 72 for Uber and Restaurant reviews datasets. Whereas Cat-Boost and Random-Forest score the highest AUC score, 77 for the Amazon reviews dataset. In the case of Food reviews, Meta-Estimator (Linear SVC) archives a higher AUC score of 87 for text classification. Ada-Boost obtains a higher AUC score for two (Uber and Restaurant) review datasets. We can say that Ada-Boost is the best model to classify text reviews. It has also been discovered that Meta-Estimator (Linear SVC) is more capable of classifying the reviews of the large dataset as it outperforms for Food reviews dataset, which stores maximum reviews.
Figure 10 and Table 21 depict the weighted precision, weighted recall, and weighted f1-score of all the experimented models for four datasets. The bagging-based Meta-Estimator (Linear SVC) obtains a higher weighted precision value (93% and 87%) for the Uber and Food reviews datasets. The Cat-Boost and Random Forest ensemble achieves a higher weighted precision score of 79% for Amazon reviews. At the same time, XGB obtains higher weighted precision of 80% for Restaurant reviews. It means that Meta-Estimator (Linear SVC), Cat-Boost, and Random-Forest ensembles generate low false-positive rates to classify text, respectively. It can be seen that Meta-Estimator (Linear SVC) obtains higher weighted precision, weighted recall, and weighted f1-score of 87% for large Food review datasets, which indicates it is more capable of identifying actual facts and not disturbed by false rates correctly. Extra-Tree gives a higher weighted recall of 72% and a weighted f1-score of 71% for Restaurant reviews, and Random-Forest provides higher weighted precision of 79%, weighted recall 78%, and weighted f1-score 77% for the Amazon reviews dataset. Conclusively, from eight experimented ensemble techniques Meta-Estimator (Linear SVC), Random-Forest generates low false-positive and low false-negative rates for SA. Furthermore, Meta-Estimator (Linear SVC) is an efficient ensemble model for large and small datasets.
Figure 11 depicts the training accuracy of experimented ensemble models for different datasets, and Fig. 12 presents the testing accuracy of tested ensemble models for other datasets. According to training accuracy, Extra-Tree and Random-Forest obtain higher and equal scores of 100% for Uber reviews, 93.37% for Restaurant reviews, 93.62% for Amazon reviews, and 100% for Food reviews.
In testing accuracy, Random-Forest achieves a higher score of 91.82% for Uber reviews, Extra-Tree achieves 71.50% for Restaurant Reviews, Random-Forest and Extra-Tree achieve a higher and equal score of 77.50% for Amazon reviews, and Meta-Estimator (Linear SVC) obtains 86.94% score for Food reviews. In addition, from the boosting concept, XGB receives a higher training accuracy score of 87.62%, 89.50%, and 95.14% for Restaurant, Amazon, and Food reviews datasets. The Cat-Boost ensemble obtains the highest testing accuracy score of 91.07%, 71.00%, 76.00%, and 86.40% for Uber, Restaurant, Amazon, and Food reviews datasets. For Uber reviews, Light-GBM obtains the highest and equal training accuracy of 100% with Random-Forest and Extra-Tree. Conclusively, Cat-Boost achieves better training and testing accuracy than all the boosting techniques but cannot beat the bagging approach's performance as Random-Forest and Extra-Tree outperform over boosting ensemble techniques.
After analyzing the results of all the experimented ensemble techniques according to the different measures, we discovered some important facts regarding the high and low performance of bagging and boosting-based ensemble models for SA using multiple datasets. We conclude the different types and lengths of datasets influence the performance of SA distinctly.
-
Gradient-Boosting generates the minimum difference (1.69%, 4.12%, 6.75%, and 0.41) between training and testing accuracy scores for (Uber, Restaurant, Amazon, and Food) both large and small kinds of datasets, which means it overcomes the problem of overfitting and underfitting and reduces the bias and variance for training the model.
-
Cat-Boost obtains state-of-the-art results for SA on discriminant datasets. It achieves higher testing accuracy and AUC score for all the experimented datasets. Cat-Boost is very easy to implement and generates competitive results with the help of one-hot encoding.
-
As we know that Light-GBM is a robust algorithm and capable of handling large datasets but according to our experiments, Light-GBM provides less accuracy and AUC score for text classification than other experimented ensemble techniques. Although Light-GBM produces higher results as 91.88% training accuracy, 85.58% testing accuracy, and 85 AUC score for the large Food reviews dataset, still unable to beat the performance of Cat-Boost, and XGB.
-
Cat-Boost and Gradient-Boosting are two main approaches with discriminant frameworks. Apart from it, XGB, Light-GBM, and Cat-Boost follow the framework of Gradient-Boosting. Experiments show Ada-Boost performs better than Gradient-Boosting in training, testing, and AUC scores for all the datasets but is unable to solve overfitting and underfitting problem, generating a higher difference between training and testing accuracy than Gradient-Boosting.
-
Random-Forest and Extra-Tree are bagging-based approaches, where Random-Forest chooses the optimum split and Extra-Tree chooses random division for selecting the nodes. According to our experiments, both algorithms obtain equal training (100%, 93.37%, 93.62%, and 100%) accuracy on all the experimented datasets. For testing and AUC score, they also generate similar approximate values. Therefore, it can declare that Random-Forest and Extra-Tree algorithms are equally capable of sentiment classification.
-
Meta-Estimator with Linear SVC is a bagging-based approach that uses Linear SVC for bagging procedures instead of decision trees. The demonstration shows that Meta-Estimator (Linear-SVC) obtains good results in terms of TPR, FPR, and run-time than all the experimented ensemble techniques, which means it can generate lower false positive and false negative rates and faster execution.
As discussed above, our primary motive was to compare the bagging-based ensemble with the boosting-based ensemble to perform SA. After analyzing the results presented in Table 20, Figs. 10, 11, 12, and 13. We decide that bagging-based ensemble techniques (Random-Forest, Extra-Tree, and Meta-Estimator (Linear SVC)) performed better than boosting-based techniques. Random-Forest and Extra-Tree perform almost equally. Meta-Estimator (Linear SVC) gives less training accuracy and testing accuracy than Extra-Tree and Random-Forest but provides higher speed comparatively. However, XGB and Cat-Boost obtain better accuracy and TPR than other boosting ensembles but cannot beat the performance of bagging-based ensembles. Hence, bagging ensemble-based techniques provide state-of-the-art results for SA. In the introduction part, we have raised some questions regarding the essential aspects and trends of SA.
6 Research opportunities in SA
SA has gained popularity in various fields, including medicine, politics, industries, and finance. Therefore, researchers are developing various intelligent models for SA. Figure 13 presents the major application areas for SA, where researchers can develop generalized frameworks for real-life applications. Further subsections describe these future opportunities of SA in detail.
6.1 SA in medical
SA is gaining popularity in healthcare industries and improving the quality of healthcare services. The opinion and reviews of the patients help healthcare providers to diagnose a particular disease (Abualigah et al. 2020). The COVID-19 outbreak increased the demand for SA in healthcare-related services. SA has been applied to extract the opinion of people towards nation wise lockdown due to the COVID-19 pandemic (Barkur and Vibha 2020 Jun). A novel fusion model has been developed to study the tweets of various coronavirus-affected countries (Basiri et al. 2021 Sep). Medical documents reflect the information of the patients in terms of diagnosis, examinations, observations, and interventions. Judging the medical conditions of the patients in the form of positive and negative responses is required. Several methods have also been developed to handle these kinds of tasks (Denecke and Deng 2015 May 1). Therefore, health care departments needed huge research in the field of SA.
6.2 SA in politics
In the current digital world, politics has moved on different levels, and countries' governments use social platforms to extract the people's opinions towards the established laws and policies. SA has been exponentially implemented to know the voice of people. A two-stage model has been developed to predict the results of the election (Ramteke et al. 2016). In the past two years, farmers' protests against three legislation bills passed by the Indian government have shaken the world. Here, artificial intelligence-based SA increased its demand to provide the direction for this democratic dispute (Neogi et al. 2021 Nov 1). A Twitter dataset of the US presidential election 2016 was collected and applied to the SA to find the choice of people between Hillary Clinton and Donald Trump (Somula et al. 2016). Hence efficient SA models have been required to solve political issues.
6.3 SA in industries
SA provides huge support for incremental growth in businesses. Industries use various applications of SA, such as brand monitoring, feedback gathering, the voice of customers (VoC), product analysis, market research, and competitive research. These SA-based applications help industries with decision-making. A novel LSTM-CNN-based model has been developed using a grid search optimization method to find out the opinion of customers for a restaurant (Priyadarshini and Cotton 2021). An automatic brand monitoring framework was proposed using Twitter Romanian data. This model effectively generated the reputation report of a single brand, a comparative report of two different companies, and desired time frame (Istrati et al. 2021).
6.4 SA in finance
SA is used to evaluate the financial sector news and helps investors to choose beneficial schemes to invest in. The excessive growth of SA in finance has been seen with the increasing popularity of cryptocurrency. Several cryptocurrencies like Bitcoin, Ethereum, Binance Coin, Quant, Solana, and ZCash are available in the digital finance platform. There is no legislated background available for these cryptocurrencies by which users can faith on them to invest. SA is the only solution that can provide the opinion of different people towards a particular cryptocurrency and helps in decision-making. Machine learning techniques have been used to predict the price movement of Bitcoin, Ethereum, Ripple, and Litecoin cryptocurrencies (Valencia et al. 2019). SA has a wide future scope in cryptocurrency price movement predictions. Researchers are taking a keen interest in this field.
6.5 Technical discussion
SA is widely adopted in different kinds of tasks initiated, from extracting customer opinion (Kumar et al. 2019) toward specific issues to monitoring the patients' mental health based on their posts on social platforms. Additionally, the emergence of new technologies such as Cloud Computing, Big Data (Birjali et al. 2021), Data Science, and Blockchain has widened the field of NLP, including SA. It provided many benefits in the business intelligence domain; companies exploited the SA for customer feedback, product improvement, and marketing strategies (Bernabé-Moreno et al. 2020). SA became a handy tool in cryptocurrency price prediction, Forex prediction, and stock marketing prediction. A recommender system is a model that trains to suggest relevant items (music, movies, or products) to buy. Here, the sentiment analyzer plays a major role in the recommender system for suggesting things (Birjali et al. 2021). SA gathers the opinion of users and feeds the information into the recommender system for final recommendation. Researchers proposed a novel adaptive learning model based on social platform analysis and showed how SA and Big Data could transform e-learning platforms. Furthermore, in government policies and other similar issues, SA is very helpful in monitoring possible public reactions. In the past few years, Twitter has been utilized to analyze the opinion of people toward the global COVID-19 pandemic. SA has been adopted to observe the government strategies (Alaoui et al. 2018), people's reactions, and World Health Organization (WHO) policies as a preventive measure to fight against COVID-19. The Healthcare domain is taking so much interest in SA recently. This allows medical actors to extract information about drug reactions, disease diagnosis, epidemics, and patient moods (Ramírez-Tinoco et al. 2019; Tiwari et al. 2021).
Machine learning is the most promising approach for SA. Usually, machine learning-based SA provides a high accuracy score than the lexicon-based approach.. It offers various feature engineering techniques that extract the critical features from the dataset and improve the efficiency of SA. A graph-based approach connects interrelated words in text reviews to calculate the sentiment and opinion of people where vertices and nodes conform to features available in reviews. Various graph-based methods and algorithms have been applied in the last decades to solve the problem of SA (Tiwari and Kumar 2020; Bhati and Rai 2021). SA improves the classification problem by reducing poor and unfortunate selection. It has the capability and knowledge of various learners, which increases the accuracy of a classification and decreases errors in prediction. The hybrid approach utilizes the capability of various approaches such as rule-based, lexicon-based, machine learning, or deep learning-based. It enhances the efficiency of the SA model with optimum results. It is an idea that generates in a researcher's mind to develop the best approach for a particular task.
The journals "Elsevier" and "Springer" are two more common venues for SA publications. Where Elsevier, Springer, and ACM are three popular publishers that are chosen by researchers for SA-related authentic research. A benchmark dataset plays a vital role in sound research. It has been seen in the graph that researchers more frequently use product reviews for their experiments. Secondly, Twitter gained more popularity among researchers for SA-related experiments. Few researchers also generate their datasets for sentiment classification. Whereas the researchers also consider movie reviews, medical reviews, and hotel reviews for SA-related experiments. SA is an emerging field, but it has various challenges, making it process-critical and decreasing the efficiency of related models. Although researchers are working to solve these issues using discriminant techniques, there is still a lack of accuracy. These challenges generate obstacles to extracting the correct meaning of sentiments and classifying the correct polarity. The number of N features increases the domain dimensionality of the datasets. Feature engineering is a very important step in SA applications and opinion mining. Feature selection and feature extraction should be intractable with final processing in optimal feature engineering. Table 22 summarizes the response of the studies addressing each research question.
7 Conclusions and future work
This article presents an immense literature survey of 92 reputed articles, which includes lexicon-based, graph-based, machine learning-based, ensemble-based, and hybrid-based techniques for SA. It is observed that ensemble-based and hybrid-based techniques gained more popularity for text classification. In addition, essential aspects such as frequently used SA datasets, publishing platforms, proposed techniques, SA challenges, SA feature-engineering techniques, and various emotion theories are also discussed in this study. With the rapid demand for SA, several challenges are also occurred in processing the text reviews. So, we discussed several SA-related challenges, namely stance-detection, sarcasm-detection, negation-handling, domain-dependence, huge-lexicon, word sense disambiguation, and anaphora resolution, with their proposed solutions. The feature-engineering is a prime factor for effective text classification. Here we convey an extensive taxonomy of feature-engineering techniques used for text processing. The emotion theory of five admired researchers has also been discussed. The essence of their idea represents disgust, fear, anger, happiness, and sadness, mainly included in basic emotion classification from the text information.
Our primary objective is to provide great relevance to the companies for selecting a better sentiment model for their brand monitoring and product reviews. This article also implemented numerous ensemble-based techniques on different domain reviews datasets, providing a systematic comparative analysis of bagging and boosting-based ensemble for SA. We have also illustrated the core of ensemble-based techniques for SA. Five boosting-based ensembles and three bagging-based ensemble techniques have been implemented on four text review datasets to conduct extensive experiments. The previously discussed ensemble-based research, incorporated with experimented results, provides a broad perspective to apply ensemble-based techniques for SA. Finally, experimental results demonstrate that bagging-based ensemble techniques outperform in terms of TPR, FPR, accuracy, weighted precision, weighted recall, weighted f1-score, AUC-score, and run-time for SA. However, XGB and Cat-Boost from boosting approach produced effective results but were unable to beat the performance of bagging-based ensembles. This survey with an analytical study will help in determining the best technique for preparing SA-related applications. For future contributions, we will explore the hybrid approaches, where discriminant techniques and models are combined to develop a better model for SA with reduced computational cost. The goal is to develop a hybrid model for SA application with a combination of different approaches. Therefore, we will assess the effectiveness and reliability of the hybrid methods with different types of parameters.
Data availability
Uber-Reviews: https://www.kaggle.com/code/hershyandrew/uber-reviews-text-analysis/data. Food-Reviews: https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews. Amazon Reviews: https://www.kaggle.com/code/saurav9786/recommender-system-using-amazon-reviews/data. Restaurant Reviews: https://www.kaggle.com/datasets/d4rklucif3r/restaurant-reviews
References
Abdulla NA, Ahmed NA, Shehab MA, Al-Ayyoub M, Al-Kabi MN, Al-rifai S (2014) Towards improving the lexicon-based approach for arabic sentiment analysis. Int J Inform Technol Web Eng (IJITWE) 9(3):55–71
Abdul-Mageed M, Diab MT (2012) AWATIF: a multi-genre corpus for modern standard arabic subjectivity and sentiment analysis. LREC 515:3907–3914
Abualigah L, Alfar HE, Shehab M, Hussein AM (2020) Sentiment analysis in healthcare: a brief review. Recent advances in NLP: The Case of Arabic Language. pp. 129–41
Aisopos F, Papadakis G, Varvarigou T (2011) Sentiment analysis of social media content using n-gram graphs. In Proceedings of the 3rd ACM SIGMM international workshop on Social media, pp. 9–14
Akhtar MS, Gupta D, Ekbal A, Bhattacharyya P (2017) Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis. Knowl-Based Syst 125(2017):116–135
Akter S, Tareq Aziz M (2016) Sentiment analysis on facebook group using lexicon based approach. In 2016 3rd International Conference on Electrical Engineering and Information Communication Technology (ICEEICT). IEEE pp. 1–4
AlBadani B, Shi R, Dong J (2022) A novel machine learning approach for sentiment analysis on Twitter incorporating the universal language model fine-tuning and SVM. Appl Syst Innov 5(1):13
Alshutayri AOO, Atwell E (2017) Exploring Twitter as a source of an Arabic dialect corpus. Int J Comput Linguist (IJCL) 8(2):37–44
Al-Twairesh N, Al-Khalifa H, Al-Salman A (2014) Subjectivity and sentiment analysis of Arabic: trends and challenges. In 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA). IEEE pp. 148–155
Amrani YA, Lazaar M, Kadiri KEE (2018) Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci 127(2018):511–520
Anjaria M, Reddy Guddeti RM (2014) A novel sentiment analysis of social networks using supervised learning. Soc Netw Anal Min 4(1):181
Asghar MZ, Ahmad S, Qasim M, Zahra SR, Kundi FM (2016) SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5(1):1–23
Augenstein I, Rocktäschel T, Vlachos A, Bontcheva K (2016) Stance detection with bidirectional conditional encoding. arXiv preprint arXiv:1606.05464
Aung KZ, Myo NN (2017) Sentiment analysis of students' comment using lexicon based approach. In 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS). IEEE pp. 149–154
Avverahalli Ramesha P (2017) Sentiment Analysis of Medicine Reviews using Ensemble models. PhD diss., National College of Ireland, Dublin
Baccianella S, Esuli A, Sebastiani F (2013) Using micro-documents for feature selection: The case of ordinal text classification. Expert Syst Appl 40(11):4687–4696
Bahad P, Saxena P (2020). Study of adaboost and gradient boosting algorithms for predictive analytics. In International Conference on Intelligent Computing and Smart Communication 2019. Springer, Singapore (pp. 235–244)
Bari A, Saatcioglu G (2018) Emotion artificial intelligence derived from ensemble learning. In 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, pp. 1763–1770
Barkur G, Vibha GB (2020) Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: evidence from India. Asian J Psychiatr 51:102089
Basiri ME, Nemati S, Abdar M, Asadi S, Acharrya UR (2021) A novel fusion-based deep learning model for sentiment analysis of COVID-19 tweets. Knowl-Based Syst 27(228):107242
Benlahbib A, Nfaoui EH (2020) A hybrid approach for generating reputation based on opinions fusion and sentiment analysis. J Org Comput Electron Comm 30(1):9–27
Bernabé-Moreno J, Tejeda-Lorente A, Herce-Zelaya J, Porcel C, Herrera-Viedma E (2020) A context-aware embeddings supported method to extract a fuzzy sentiment polarity dictionary. Knowl-Based Syst 190:1–13
Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Comput Sci 46(2015):635–643
Bhati BS, Rai CS (2021) Intrusion detection technique using coarse gaussian SVM. Int J Grid Util Comput 12(1):27–32
Bhati BS, Chugh G, Al-Turjman F, Bhati NS (2020) An improved ensemble based intrusion detection technique using XGBoost. Transact Emerg Telecommun Technol 2020:e4076
Parminder Bhatia, Yangfeng Ji, and Jacob Eisenstein. 2015. Better document-level sentiment analysis from rst discourse parsing. arXiv preprint arXiv:1509.01599.
Bhoir P, Kolte S (2015.)Sentiment analysis of movie reviews using lexicon approach. In 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–6. IEEE
Bibi M, Abbasi WA, Aziz W, Khalil S, Uddin M, Iwendi C, Gadekallu TR (2022) A novel unsupervised ensemble framework using concept-based linguistic methods and machine learning for twitter sentiment analysis. Pattern Recogn Lett 158:80–86
Behera B, Kumaravelan G, Kumar P (2019) Performance evaluation of deep learning algorithms in biomedical document classification. In 2019 11th International Conference on Advanced Computing (ICoAC). IEEE pp. 220–224
Birjali M, Kasri M, Beni-Hssane A (2021) A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl-Based Syst 226:107134
Bordoloi M, Biswas SK (2020) Graph based sentiment analysis using keyword rank based polarity assignment. Multimed Tools Appl 79(47):36033–36062
Bouazizi M, Ohtsuki TO (2016) A pattern-based approach for sarcasm detection on twitter. IEEE Access 4(2016):5477–5488
Cabral L, Hortacsu A (2010) The dynamics of seller reputation: evidence from eBay. J Ind Econ 58(1):54–78
Castillo E, Cervantes O, Vilarino D, Báez D, Sánchez A (2015) UDLAP: sentiment analysis using a graph-based representation. In Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015), pp. 556–560
Chaithra VD (2019) Hybrid approach: Naive bayes and sentiment VADER for analyzing sentiment of mobile unboxing video comments. Int J Electr Comput Eng 9(5):4452
Chalothom T, Ellman J (2015) Simple approaches of sentiment analysis via ensemble learning. In information science and applications. Springer, Berlin, Heidelberg. pp. 631–639
Che W, Zhao Y, Guo H, Zhong Su, Liu T (2015) Sentence compression for aspect-based sentiment analysis. IEEE/ACM Transact Audio, Speech, Lang Process 23(12):2111–2124
Chen H, Yang CC (2011) Special issue on social media analytics: Understanding the pulse of the society. IEEE Transact Syst Man Cyber-Part A 41(5):826–827
Chen S, Tian Y, Liu Q, Metaxas DN (2013) Recognizing expressions from face and body gesture by temporal normalized motion and appearance features. Image Vision Comput 31(2):175–185
Chen Z, Fei Lu, Yuan Xu, Zhong F (2017) TCMHG: Topic-based cross-modal hypergraph learning for online service recommendations. IEEE Access 6(2017):24856–24865
Chen F, Xia J, Gao H, Xu H, Wei W (2021) TRG-DAtt: The target relational graph and double attention network based sentiment analysis and prediction for supporting decision making. ACM Transact Manag Inform Syst (TMIS) 13(1):1–25
Du C, Sun H, Wang J, Qi Q, Liao J (2020) Adversarial and domain-aware bert for cross-domain sentiment analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4019–4028
Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney MW (2007) Feature selection methods for text classification. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 230–239
Denecke K, Deng Y (2015) Sentiment analysis in medical settings: New opportunities and challenges. Artif Intell Med 64(1):17–27
Dey A, Jenamani M, Thakkar JJ (2018) Senti-N-Gram: An n-gram lexicon for sentiment analysis. Expert Syst Appl 103(2018):92–105
Dey L, Chakraborty S, Biswas A, Bose B, Tiwari S (2016) Sentiment analysis of review datasets using naive bayes and k-nn classifier. arXiv preprint arXiv:1610.09982
Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput. https://doi.org/10.1162/neco.1994.6.6.1289
El Alaoui I, Gahi Y, Messoussi R, Chaabi Y, Todoskoff A, Kobi A (2018) A novel adaptable approach for sentiment analysis on big social data. J Big Data 5(1):1–18
Elmurngi E, Gherbi A (2017) Detecting fake reviews through sentiment analysis using machine learning techniques. IARIA/Data Analytics 2017:65–72
Elshakankery K, Ahmed MF (2019) HILATSA: a hybrid incremental learning approach for Arabic tweets sentiment analysis. Egypt Inform J 20(3):163–171
Farooq U, Mansoor H, Nongaillard A, Ouzrout Y, Qadir MA (2017) Negation handling in sentiment analysis at sentence level. JCP 12(5):470–478
Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis: Bayesian ensemble learning. Decis Support Syst 68(2014):26–38
Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. icml 96:148–156
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
García-Pablos A, Cuadros M, Rigau G (2018) W2VLDA: almost unsupervised system for aspect based sentiment analysis. Expert Syst Appl 91:127–137
Gautam D, Maharjan N, Banjade R, Tamang LJ, Rus V (2018) Long short term memory based models for negation handling in tutorial dialogues. In FLAIRS Conference, pp. 14–19
Ge N,Hale J, Charniak E (1998) A statistical approach to anaphora resolution. In Sixth Workshop on Very Large Corpora
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42
Giachanou A, Crestani F (2016) Like it or not: A survey of twitter sentiment analysis methods. ACM Comput Surveys (CSUR) 49(2):1–41
Gomez JC, Boiy E, Moens M-F (2012) Highly discriminative statistical features for email classification. Knowl Inform Syst 31(1):23–53
Govindarajana M (2013) Sentiment analysis of movie reviews using hybrid method of naive bayes and genetic algorithm. Int J Adv Comput Res 3(4):139
Govindarajanb M (2014) Sentiment analysis of restaurant reviews using hybrid classification method. Int J Soft Comput Artif Intell 2(1):17–23
Gunesa H, Piccardi M (2005) Affect recognition from face and body: early fusion vs. late fusion. In 2005 IEEE international conference on systems, man and cybernetics. IEEE vol. 4, pp. 3437–3443
Gunesb H, Piccardi M (2008) Automatic temporal segment detection and affect recognition from face and body display. IEEE Transact Syst Man Cyber Part B Cyber 39(1):64–84
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Habernal I, Ptáček T, Steinberger J (2013) Sentiment analysis in czech social media using supervised machine learning. In Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp. 65–74
Hablani R, Chaudhari N, Tanwani S (2013) Recognition of facial expressions using local binary patterns of important facial parts. Int J Image Process (IJIP) 7(2):163–170
Hasan A, Moin S, Karim A, Shamshirband S (2018) Machine learning-based sentiment analysis for twitter accounts. Math Comput Appl 23(1):11
Hayat M, Bennamoun M (2014) An automatic framework for textured 3D video-based facial expression recognition. IEEE Transact Affect Comput 5(3):301–313
Heredia B, Khoshgoftaar TM, Prusa JD, Crawford M (2017) Improving detection of untrustworthy online reviews using ensemble learners combined with feature selection. Soc Netw Anal Min 7(1):1–18
Inza I, Larranaga P, Blanco R, Cerrolaza AJ (2004) Filter versus wrapper gene selection approaches in DNA microarray domains. Arti Intell Med 31(2):91–103
Istrati L, Ciobotaru A (2021) Automatic monitoring and analysis of brands using data extracted from twitter in Romanian. In Proceedings of SAI Intelligent Systems Conference Sep 2 Springer, Cham (pp. 55–75)
Jagdale RS, Shirsat VS, Deshmukh SN (2019) Sentiment analysis on product reviews using machine learning techniques. In Cognitive Informatics and Soft Computing. Springer, Singapore pp. 639–647
Jagtap VS, Pawar K (2013) Analysis of different approaches to sentence-level sentiment classification. Int J Sci Eng Technol 2(3):164–170
Jing N, Wu Z, Wang H (2021) A hybrid model integrating deep learning with investor sentiment analysis for stock price prediction. Expert Syst Appl 178:115019
Joshi N, Gupta I (2019) Enhanced twitter sentiment analysis using hybrid approach and by accounting local contextual semantic. J Intel Syst 29 (1):1611–1625
Joshi A, Sharma V, Bhattacharyya P (2015) Harnessing context incongruity for sarcasm detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 757–762. 2015
Kaushik C, Mishra A (2014) A scalable, lexicon based technique for sentiment analysis. arXiv preprint arXiv:1410.2265
Keith B, Fuentes E, Meneses CM (2017) A hybrid approach for sentiment analysis applied to paper. In Proceedings of ACM SIGKDD Conference, Halifax, Nova Scotia, Canada, p. 10
Khalid M, Ashraf I, Mehmood A, Ullah S, Ahmad M, Choi GS (2020) GBSVM: sentiment classification from unstructured reviews using ensemble classifier. Appl Sci 10(8):2788
Khan J, Alam A, Hussain J, Lee Y-K (2019) EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification. Appl Intell 49(8):3123–3145
Khoo CSG, Johnkhan SB (2018) Lexicon-based sentiment analysis: comparative evaluation of six sentiment lexicons. J Inform Sci 44(4):491–511
Kim H-M, Park K (2019) Sentiment analysis of online food product review using ensemble technique. J Dig Converg 17(4):115–122
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Krishnakumari K, Akshaya P (2019) A survey on graph based approaches in sentiment analysis
Kumar S, Yadava M, Roy PP (2019) Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction. Inform Fusio 52:41–52
Kumar S, Gahalawat M, Roy PP, Dogra DP, Kim B-G (2020) Exploring impact of age and gender on sentiment analysis using machine learning. Electronics 9(2):374
Kumar V, Kalitin D, Tiwari P (2017) Unsupervised learning dimensionality reduction algorithm PCA for face recognition. In 2017 international conference on computing, communication and automation (ICCCA). IEEE pp. 32–37
Kundi FM, Khan A, Ahmad S, Asghar MZ (2014) Lexicon-based sentiment analysis in the social web. J Basic Appl Sci Res 4(6):238–48
Lappin S, Leass HJ (1994) An algorithm for pronominal anaphora resolution. Comput Linguist 20(4):535–561
Le B, and Nguyen H (2015) Twitter sentiment analysis using machine learning techniques. In Advanced computational methods for knowledge engineering. Springer, Cham pp. 279–289
Shang L (2012) A feature selection method based on information gain and genetic algorithm. In 2012 International Conference on Computer Science and Electronics Engineering. IEEE vol. 2, pp. 355–358
Liang B, Su H, Gui L, Cambria E, Xu R (2022) Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl-Based Syst 235:107643
Liu Yong, Shenggen Ju, Wang Junfeng, Chong Su (2020) A new feature selection method for text classification based on independent feature space search. Math Problm Eng. https://doi.org/10.1155/2020/6076272
Liu J, Lu Z, and Du W (2019) Combining enterprise knowledge graph and news sentiment analysis for stock price prediction. In Proceedings of the 52nd Hawaii International Conference on System Sciences
Lochter JV, Zanetti RF, Reller D, Almeida TA (2016) Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst Appl 62(2016):243–249
McHugh ML (2013) The chi-square test of independence. Biochem Med 23(2):143–149
Meena A, Prabhakar TV (2007) Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. European conference on information retrieval. Springer Berlin, Heidelberg, pp 573–580
Meng Q, Ke G, Wang T, Chen W, Ye Q, Ma Z-M, Liu T-Y (2016). A communication-efficient parallel algorithm for decision tree. arXiv preprint arXiv:1611.01276
Minaee S, Azimi E, Abdolrashidi A (2019) Deep-sentiment: Sentiment analysis using ensemble of cnn and bi-lstm models. arXiv preprint arXiv:1904.04206
Hu M, Liu B (2004) Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 168–177
Mohammad SM (2016) Sentiment analysis: Detecting valence, emotions, and other affectual states from text. In Emotion measurement. Woodhead Publishing pp. 201–237
Mondal A, Satapathy R, Das D, Bandyopadhyay S (2016) A hybrid approach based sentiment extraction from medical context. In SAAIP@ IJCAI
Montejo-Ráez A, Eugenio Martínez-Cámara M, Martín-Valdivia T, Alfonso Ureña-López L (2014) Ranked wordnet graph for sentiment polarity classification in twitter. Comput Speech Lang 28(1):93–107
Moreno-Ortiz A, Fernández-Cruz J (2015) Identifying polarity in financial texts for sentiment analysis: a corpus-based approach. Procedia Soc Behav Sci 198(2015):330–338
Mousin L, Jourdan L, Kessaci Marmion M-E, Dhaenens C (2016) Feature selection using tabu search with learning memory: learning Tabu Search. In International Conference on Learning and Intelligent Optimization. Springer, Cham pp. 141–156
Mowlaei ME, Abadeh MS, Keshavarz H (2020) Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Syst Appl 148(2020):113234
Nandi V, Agrawal S (2016) Political sentiment analysis using hybrid approach. Int Res J Eng Technol 3(5):1621–1627
Nazeer I, Rashid M, SK Gupta, Kumar A (2021) Use of novel ensemble machine learning approach for social media sentiment analysis. Analyzing global social media consumption. IGI Global pp. 16–28
Neogi AS, Garg KA, Mishra RK, Dwivedi YK (2021) Sentiment analysis and classification of Indian farmers’ protest using twitter data. Int J Inform Manag Data Insights 1(2):100019
Nguyen, Khuyen V, Emma Zhang W, Sheng QZ (2018). Identifying price index classes for electricity consumers via dynamic gradient boosting. In International Conference on Web Information Systems Engineering. Springer, Cham. pp. 472–486
Onana A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62(2016):1–16
Onanb A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers in text classification. Expert Syst Appl 57(2016):232–247
Palanisamy P, Yadav V, Elchuri H (2013) Serendio: simple and practical lexicon based approach to sentiment analysis. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 543–548
Pan SJ, Ni X, Sun J-T, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th international conference on World wide web. pp. 751–760
Patil G, Galande V, Kekan V, Dange K (2014) Sentiment analysis using support vector machine. Int J Innov Res Comput Commun Eng 2(1):2607–2612
Peng M, Zhang Q, Jiang Y, Huang X-J (2018) Cross-domain sentiment classification with target domain specific information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2505–2513
Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text using ensemble of classifiers. Eng Appl Artif Intell 51(2016):191–201
Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6(3):21–45
Ponomareva N, Thelwall M (2012) Do neighbours help? an exploration of graph-based algorithms for cross-domain sentiment classification. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp. 655–665
Priyadarshini I, Cotton C (2021) A novel LSTM–CNN–grid search-based deep neural network for sntiment analysis. J Supercomput 5:1–22
Rahman M, Islam MN (2022) Exploring the performance of ensemble machine learning classifiers for sentiment analysis of covid-19 tweets. In Sentimental analysis and deep learning. Springer, Singapore pp. 383–396
Rajagopal D, Cambria E, Olsher D, Kwok K (2013) A graph-based approach to commonsense concept extraction and semantic similarity detection. In Proceedings of the 22nd International Conference on World Wide Web, pp. 565–570
Rajput Q, Haider S, Ghani S (2016) Lexicon-based sentiment analysis of teachers’ evaluation. Appl Comput Intell Soft Comput 2016:2385429. https://doi.org/10.1155/2016/2385429
Ramírez-Tinoco FJ, Alor-Hernández G, Sánchez-Cervantes JL, Salas-Zárate MP, Valencia-García R (2019) Use of sentiment analysis techniques in healthcare domain. Stud Comput Intell. https://doi.org/10.1007/978-3-030-06149-4_8
Ramteke J, Shah S, Godhia D, Shaikh A (2016) Election result prediction using Twitter sentiment analysis. In2016 international conference on inventive computation technologies (ICICT) 2016 Aug 26 1:1–5. IEEE
Random-Forest. https://towardsdatascience.com/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3. Accessed 5 Feb 2021
Ray P, Chakrabarti A (2017) Twitter sentiment analysis for product review using lexicon method. In 2017 International Conference on Data Management, Analytics and Innovation (ICDMAI). IEEE pp. 211–216
Revathy K, Sathiyabhama B (2013) A hybrid approach for supervised twitter sentiment classification. International Journal of Computer Science and Business Informatics 7(1)
Rushdi Saleh M, Martín-Valdivia MT, Montejo-Ráez A, Ureña-López LA (2011) Experiments with SVM to classify opinions in different domains. Expert Syst Appl 38(12):14799–14804
Saleena N (2018) An ensemble classification system for twitter sentiment analysis. Procedia Comput Sci 132(2018):937–946
Sallam RM, Hussein M, Mousa HM (2022) Improving collaborative filtering using lexicon-based sentiment analysis. Int J Electr Comput Eng 12(2):1744
Sharaff A, Gupta H (2019) Extra-tree classifier with metaheuristics approach for email classification. In Advances in Computer Communication and Computational Sciences. Springer, Singapore pp. 189–197
Sharkaway RM, Ibrahim K, Salama MMA, Bartnikas R (2011) Particle swarm optimization feature selection for the classification of conducting particles in transformer oil. IEEE Transact Dielectr Electr Insul 18(6):1897–1907
Sharmaa A, Dey S (2012) A document-level sentiment analysis approach using artificial neural network and sentiment lexicons. ACM SIGAPP Appl Comput Rev 12(4):67–75
Sharmab A, Dey S (2013) A boosted svm based ensemble classifier for sentiment analysis of online reviews. ACM SIGAPP Appl Comput Rev 13(4):43–52
Sharmac A, Dey S (2012) Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. IJCA Spl Issue Adv Comput Commun Technol HPC Appl 3(2012):15–20
Shi H-X, Li X-J (2011) A sentiment analysis model for hotel reviews based on supervised learning. In 2011 International Conference on Machine Learning and Cybernetics. IEEE vol. 3, pp. 950–954
Singh J, Singh G, Singh R (2017) Optimization of sentiment analysis using machine learning classifiers. Human-centric Comput Inform Sci 7(1):1–12
Sohn S, Torii M, Li D, Wagholikar K, Stephen W, Liu H (2012) A hybrid approach to sentiment sentence classification in suicide notes. Biomed Inform Insights 5(2012):S8961
Somula R, Kumar KD, Aravindharamanan S, Govinda K (2016) Twitter sentiment analysis based on US presidential electionTwitter sentiment analysis based on US presidential election. Smart Intelligent Computing and Applications 2020. Springer, Singapore, pp 363–373
Srivastava A, Singh V, Drall GS (2019) Sentiment analysis of twitter data: a hybrid approach. Int J Healthcare Inform Syst Inform (IJHISI) 14(2):1–16
Su Y, Zhang Y, Ji D, Wang Y, Wu H. 2012. Ensemble learning for sentiment classification. In Workshop on Chinese Lexical Semantics. Springer, Berlin, Heidelberg. pp. 84–93
Sun Q, Wang Z, Zhu Q, Guodong Zhou G (2018) Stance detection with hierarchical attention network. In Proceedings of the 27th international conference on computational linguistics, pp. 2399–2409
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M (2011) Lexicon-based methods for sentiment analysis. Comput Linguist 37(2):267–307
Tan C, Lee L, Tang J, Jiang L, Ming Zhou M, Li P. (2011) User-level sentiment analysis incorporating social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1397–1405
Term Frequency-Inverse Document Frequency. Retrieved from https://www.searchenginejournal.com/tf-idf-can-it-really-help-your-seo/331075/#close. Accessed 4 Feb 2021
Thet TT, Na J-C, Khoo CSG (2010) Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 36(6):823–848
Tiwari D, Kumar M (2020) Social media data mining techniques: A survey. In: Tuba M, Akashe S, Joshi A (eds) Information and communication technology for sustainable development. Springer, Singapore, pp 183–194
Tiwari D, Nagpal B (2022) KEAHT: A knowledge-enriched attention-based hybrid transformer model for social sentiment analysis. New Gener Comput. https://doi.org/10.1007/s00354-022-00182-2
Tiwari D, Bhati BS, Nagpal B, Sankhwar S, Al-Turjman F (2021) An enhanced intelligent model: to protect marine IoT sensor environment using ensemble machine learning approach. Ocean Eng 242:110180
Tiwari D, Singh N (2019) Sentiment Analysis of Digital India using Lexicon Approach. In 2019 6th International Conference on Computing for Sustainable Global Development (INDIACom). IEEE pp. 1189–1193
Torelli N, Menardi G (2008) Evaluating enterprise risk of default using boosting procedures. In First Joint Meeting of the SFC and the Cladag. Edizioni Scientifiche Italiane, 2008 pp. 129–132
Tripathy A, Anand A, Rath SK (2017) Document-level sentiment classification using hybrid machine learning approach. Knowl Inform Syst 53(3):805–831
Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Min Knowl Disc 24(3):478–514
Usman N, Usman S, Khan F, Jan MA, Sajid A, Alazab M, Watters P (2021) Intelligent dynamic malware detection using machine learning in IP reputation for forensics data analytics. Futur Gener Comput Syst 118:124–141
Uysala AK, Gunal S (2014) Text classification using genetic algorithm oriented latent semantic features. Expert Syst Appl 41(13):5938–5947
Uysalab AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43(2016):82–92
Uysalc AK, Gunal S (2012) A novel probabilistic feature selection method for text classification. Knowl-Based Syst 36(2012):226–235
Uysald AK (2018) On two-stage feature selection methods for text classification. IEEE Access 6(2018):43233–43251
Valencia F, Gómez-Espinosa A, Valdés-Aguirre B (2019) Price movement prediction of cryptocurrencies using sentiment analysis and machine learning. Entropy 21(6):589
Van Atteveldt W, Van der Velden MA, Boukes M (2021) The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Commun Methods Meas 15(2):121–140
Villena-Román J, Collada-Pérez S, Lana-Serrano S, González-Cristóbal JC (2011) Hybrid approach combining machine learning and a rule-based expert system for text categorization. In Twenty-Fourth International FLAIRS Conference
Violos J, Tserpes K, Psomakelis E, Psychas K, Varvarigou T (2016) Sentiment analysis using word-graphs. In Proceedings of the 6th International Conference on Web Intelligence, mining and semantics, pp. 1–9
Wang G, Sun J, Ma J, Kaiquan Xu, Jibao Gu (2014) Sentiment classification: The contribution of ensemble learning. Decis Support Syst 57(2014):77–93
Wang SI, Manning CD (2012) Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 90–94
Wawre SV, Deshmukh SN (2016) Sentiment classification using machine learning techniques. Int J Sci Res (IJSR) 5(4):819–821
Westgate A, Valova I (2018) A graph based approach to sentiment lexicon expansion. In International Conference on industrial, engineering and other applications of applied intelligent systems. Springer, Cham pp. 530–541
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. icml 97(412–420):35
Yerpude A, Phirke A, Agrawal A, Deshmukh A (2019) Sentiment analysis on product features based on lexicon approach using natural language processing. Int J Nat Lang Comput (IJNLC) 8(3):1–15
Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pp. 129–136
Zainuddin N, Selamat A, Ibrahim R (2018) Hybrid sentiment classification on twitter aspect-based sentiment analysis. Appl Intell 48(5):1218–1232
Zareapoor M, Seeja KR (2015) Feature extraction or feature selection for text classification: A case study on phishing email detection. Int J Inform Eng Electron Business 7(2):60
Zhang C, Zeng D, Li J, Wang F-Y, Zuo W (2009) Sentiment analysis of Chinese documents: From sentence to document level. J Am Soc Inform Sci Technol 60(12):2474–2487
Zhang J, Mucs D, Norinder U, Svensson F (2019) LightGBM: An effective and scalable algorithm for prediction of chemical toxicity–application to the Tox21 and mutagenicity data sets. J Chem Inform Model 59(10):4150–4158
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tiwari, D., Nagpal, B., Bhati, B.S. et al. A systematic review of social network sentiment analysis with comparative study of ensemble-based techniques. Artif Intell Rev 56, 13407–13461 (2023). https://doi.org/10.1007/s10462-023-10472-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-023-10472-w