A Deep Learning Sentiment Analyser for Social Media Comments in Low-Resource Languages
<p>High-level architecture diagram of the proposed A<span class="html-small-caps">lb</span>A<span class="html-small-caps">na</span> analyser.</p> "> Figure 2
<p>Dataset 1.0—Unlabeled comments.</p> "> Figure 3
<p>Final dataset.</p> "> Figure 4
<p>Length of comments among sentiment classes.</p> "> Figure 5
<p>Number of comments across months.</p> "> Figure 6
<p>Deep neural networks used for sentiment classification.</p> "> Figure 7
<p>Baseline model summary.</p> "> Figure 8
<p>Performance of BiLSTM with respect to various sizes of context.</p> "> Figure 9
<p>Sentiment classifier using mBERT.</p> "> Figure 10
<p>Confusion matrix of sentiment classifier using mBERT.</p> ">
Abstract
:1. Introduction
- The collection of a large-scale dataset composed of 10,742 manually classified Facebook comments related to the COVID-19 pandemic. To the best of our knowledge, this is the first study that performed sentiment analysis of Facebook comments in a low-resource language such as the Albanian language.
- A deep learning based sentiment analyser called AlbAna is proposed and validated on the collected and curated COVID-19 dataset.
- An attention mechanism is used to characterize the word level interactions within a local and global context to capture the semantic meaning of words.
2. Related Work
3. Methodology
4. Experiments
4.1. Dataset
4.1.1. Dataset Collection
has a positive initial sentence and expresses positive sentiment towards the medical staff whereas the second part of the comment expresses negative sentiment towards contact cases who are not isolated and got infected. This comment contains both positive and negative sentiments and it was typically annotated differently by human annotators.“Comment No 894: Juve stafit mjeksor respekt ndersa ktyre qe jane raste kontakti qe nuk kane nejt ne shtepi po kane shku musafir e jone infektu turp! (Respect to the medical staff while shame on contact cases who have not stayed at home but have visited relatives and got infected!)”
is a sarcastic expression that without any contextual clues might have been understood and annotated differently by annotators.“Comment No 5570: Bile diqka me pas pozitive në këtë Kosovë. (At least to have something positive in Kosova.)”
- ID—unique identification number for each comment
- Comment—the content of the comment where the sentiment analysis is performed
- Like—the number of Facebook reactions to the relevant comment
- CMNT’s TS—comment’s timestamp shows the day, date and time of the comment
- Post’s TS—post’s timestamp indicates the day, date and time of the post to which the comment belongs
- Post’s URL—post’s link of the relevant comment
- #Deaths—number of persons who have died due to the pandemic in that day
- #Infected—number of infected persons with COVID-19
- #Healed—number of people who are recovered from COVID-19
- Annot 1—annotator 1
- Annot 2—annotator 2
- Annot 3—annotator 3
- Final annotation—the final sentiment of the comment derived through a majority voting strategy.
4.1.2. Dataset Statistics
4.2. Deep Neural Networks
4.3. Conventional Machine Learning Models
5. Results
5.1. Parameter Settings
5.2. Our Baseline Model
5.3. Deep Neural Networks
5.3.1. Attention Mechanism
5.3.2. 1D-CNN w/o Attention Mechanism
5.3.3. BiLSTM w/o Attention Mechanism
5.3.4. Hybrid Model w/o Attention Mechanism
5.3.5. Static Word Embeddings
5.3.6. Contextualized Word Embeddings
5.4. Conventional Machine Learning Models
6. Discussion
7. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Wang, T.; Lu, K.; Chow, K.P.; Zhu, Q. COVID-19 Sensing: Negative sentiment analysis on social media in China via Bert Model. IEEE Access 2020, 8, 138162–138169. [Google Scholar] [CrossRef]
- Allington, D.; Duffy, B.; Wessely, S.; Dhavan, N.; Rubin, J. Health-protective behaviour, social media usage and conspiracy belief during the COVID-19 public health emergency. Psychol. Med. 2020, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Chan, A.K.; Nickson, C.; Rudolph, J.; Lee, A.; Joynt, G. Social Media for Rapid Knowledge Dissemination: Early Experience from the COVID-19 Pandemic. Anaesthesia 2020, 75, 1579–1582. [Google Scholar] [CrossRef]
- Singh, N.K.; Tomar, D.S.; Sangaiah, A.K. Sentiment analysis: A review and comparative analysis over social media. J. Ambient Intell. Humaniz. Comput. 2020, 11, 97–117. [Google Scholar] [CrossRef]
- Rajput, Q.; Haider, S.; Ghani, S. Lexicon-based sentiment analysis of teachers’ evaluation. Appl. Comput. Intell. Soft Comput. 2016, 2016, 2385429. [Google Scholar] [CrossRef] [Green Version]
- Chakraborty, K.; Bhatia, S.; Bhattacharyya, S.; Platos, J.; Bag, R.; Hassanien, A.E. Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media. Appl. Soft Comput. 2020, 97, 106754. [Google Scholar] [CrossRef]
- Imran, A.S.; Daudpota, S.M.; Kastrati, Z.; Batra, R. Cross-Cultural Polarity and Emotion Detection Using Sentiment Analysis and Deep Learning on COVID-19 Related Tweets. IEEE Access 2020, 8, 181074–181090. [Google Scholar] [CrossRef]
- Chauhan, G.S.; Agrawal, P.; Meena, Y.K. Aspect-based sentiment analysis of students’ feedback to improve teaching–learning process. Inf. Commun. Technol. Intell. Syst. 2019, 12, 259–266. [Google Scholar]
- Vilares, D.; Peng, H.; Satapathy, R.; Cambria, E. BabelSenticNet: A Commonsense Reasoning Framework for Multilingual Sentiment Analysis. In Proceedings of the IEEE Symposium Series on Computational Intelligence, Bangalore, India, 18–21 November 2018; pp. 1292–1298. [Google Scholar]
- Stappen, L.; Baird, A.; Cambria, E.; Schuller, B. Sentiment Analysis and Topic Recognition in Video Transcriptions. IEEE Intell. Syst. 2021, 36, 1–8. [Google Scholar]
- Chauhan, P.; Sharma, N.; Sikka, G. The emergence of social media data and sentiment analysis in election prediction. J. Ambient Intell. Humaniz. Comput. 2021, 12, 2601–2627. [Google Scholar] [CrossRef]
- Carosia, A.; Coelho, G.P.; Silva, A. Analyzing the Brazilian financial market through Portuguese sentiment analysis in social media. Appl. Artif. Intell. 2020, 34, 1–19. [Google Scholar] [CrossRef]
- Kastrati, Z.; Imran, A.S.; Kurti, A. Weakly supervised framework for aspect-based sentiment analysis on students’ reviews of moocs. IEEE Access 2020, 8, 106799–106810. [Google Scholar] [CrossRef]
- Kastrati, Z.; Arifaj, B.; Lubishtani, A.; Gashi, F.; Nishliu, E. Aspect-Based Opinion Mining of Students’ Reviews on Online Courses. In Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence, ICCAI’20. Tianjin, China, 23–26 April 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 510–514. [Google Scholar]
- Beci, B. Gramatika e Gjuhes Shqipe; Logos-A: Shkup, North Macedonia, 2005. [Google Scholar]
- Singh, R.; Singh, R.; Bhatia, A. Sentiment analysis using Machine Learning technique to predict outbreaks and epidemics. Int. J. Adv. Sci. Res. 2018, 3, 19–24. [Google Scholar]
- Sharma, K.; Seo, S.; Meng, C.; Rambhatla, S.; Liu, Y. COVID-19 on Social Media: Analyzing Misinformation in Twitter Conversations. arXiv 2020, arXiv:2003.12309. [Google Scholar]
- Sesagiri Raamkumar, A.; Tan, S.G.; Wee, H.L. Measuring the Outreach Efforts of Public Health Authorities and the Public Response on Facebook During the COVID-19 Pandemic in Early 2020: Cross-Country Comparison. J. Med. Internet Res. 2020, 22, e19334. [Google Scholar] [CrossRef]
- Samuel, J.; Ali, G.G.M.N.; Rahman, M.M.; Esawi, E.; Samuel, Y. COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification. Information 2020, 11, 314. [Google Scholar] [CrossRef]
- Steinbauer, F.; Kröll, M. Sentiment analysis for German Facebook pages. In International Conference on Applications of Natural Language to Information Systems; Springer: Berlin/Heidelberg, Germany, 2016; pp. 427–432. [Google Scholar]
- Sidorenko, W. Sentiment Analysis of German Twitter. arXiv 2019, arXiv:1911.13062. [Google Scholar]
- Palm, N. Sentiment Classification of Swedish Twitter Data. Master’s Thesis, Uppsala University, Uppsala, Sweden, 2019. [Google Scholar]
- Dadoun, M.; Olssson, D. Sentiment Classification Techniques Applied to Swedish Tweets Investigating the Effects of translation on Sentiments from Swedish into English. 2016. Available online: https://www.diva-portal.org/smash/get/diva2:926472/FULLTEXT01.pdf (accessed on 5 April 2021).
- Mozetič, I.; Grčar, M.; Smailović, J. Multilingual Twitter sentiment classification: The role of human annotators. PLoS ONE 2016, 11, e0155036. [Google Scholar] [CrossRef]
- Lo, S.L.; Cambria, E.; Chiong, R.; Cornforth, D. Multilingual sentiment analysis: From formal to informal and scarce resource languages. Artif. Intell. Rev. 2017, 48, 499–527. [Google Scholar] [CrossRef]
- Biba, M.; Mane, M. Sentiment Analysis through Machine Learning: An Experimental Evaluation for Albanian. In Recent Advances in Intelligent Informatics—Proceedings of the Second International Symposium on Intelligent Informatics, ISI 2013, Mysore, India, 23–24 August 2013; Thampi, S.M., Abraham, A., Pal, S.K., Rodríguez, J.M.C., Eds.; Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2013; Volume 235, pp. 195–203. [Google Scholar]
- Kote, N.; Biba, M.; Trandafili, E. A Thorough Experimental Evaluation of Algorithms for Opinion Mining in Albanian. In Proceedings of the Advances in Internet, Data & Web Technologies, the 6th International Conference on Emerging Internet, Data & Web Technologies, EIDWT-2018, Tirana, Albania, 15–17 March 2018; Barolli, L., Xhafa, F., Javaid, N., Spaho, E., Kolici, V., Eds.; Lecture Notes on Data Engineering and Communications Technologies. Springer: Berlin/Heidelberg, Germany, 2018; Volume 17, pp. 525–536. [Google Scholar]
- Kote, N.; Biba, M.; Trendafili, E. An Experimental Evaluation of Algorithms for Opinion Mining in Multi-domain Corpus in Albanian. In Proceedings of the Foundations of Intelligent Systems—24th International Symposium, ISMIS 2018, Limassol, Cyprus, 29–31 October 2018; Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G.A., Ras, Z.W., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11177, pp. 439–447. [Google Scholar]
- Skenduli, M.P.; Biba, M.; Loglisci, C.; Ceci, M.; Malerba, D. User-Emotion Detection Through Sentence-Based Classification Using Deep Learning: A Case-Study with Microblogs in Albanian. In Proceedings of the Foundations of Intelligent Systems—24th International Symposium, ISMIS 2018, Limassol, Cyprus, 29–31 October 2018; Ceci, M., Japkowicz, N., Liu, J., Papadopoulos, G.A., Ras, Z.W., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11177, pp. 258–267. [Google Scholar]
- Skenduli, M.P.; Biba, M. Classification and Clustering of Emotive Microblogs in Albanian: Two User-Oriented Tasks. In Complex Pattern Mining—New Challenges, Methods and Applications; Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W., Eds.; Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2020; Volume 880, pp. 153–171. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Kastrati, Z.; Imran, A.S.; Yayilgan, S.Y. The impact of deep learning on document classification using semantically rich representations. Inf. Process. Manag. 2019, 56, 1618–1632. [Google Scholar] [CrossRef]
Annotator 1 | Annotator 2 | Annotator 3 | |
---|---|---|---|
Annotator 1 | 1.00 | 0.57 | 0.62 |
Annotator 2 | 0.57 | 1.00 | 0.46 |
Annotator 3 | 0.62 | 0.46 | 1.00 |
Comment (English Translation) | Sentiment |
---|---|
Do te thot Peja edhe sonte spaska asnje rast (It means that even tonight Peja does not have any case) | Neutral |
Bravo ekipet e IKShP per punen e shkelqyeshme dhe perkushtimin! (Well done the NIPHK teams for the great job and dedication!) | Positive |
Keni kalu tash ne monotoni, te pa arsyshem jeni tash. (You have now passed into monotony, you are now unreasonable.) | Negative |
Class | # of Comments | Percent (%) |
---|---|---|
Positive | 1677 | 15.6 |
Negative | 3010 | 28.0 |
Neutral | 6055 | 56.4 |
Total | 10,742 | 100.0 |
Class | P | R | F1 |
---|---|---|---|
Neutral | 78.92 | 70.58 | 74.52 |
Positive | 51.77 | 69.27 | 59.26 |
Negative | 63.92 | 65.24 | 64.57 |
Weighted avg | 70.43 | 68.88 | 69.32 |
Class | 1D-CNN | 1D-CNN + Att | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
Neutral | 74.89 | 80.44 | 77.57 | 75.92 | 79.57 | 77.71 |
Positive | 72.60 | 56.98 | 63.85 | 72.12 | 62.85 | 67.16 |
Negative | 61.83 | 60.16 | 60.98 | 62.44 | 60.95 | 61.69 |
Weighted avg | 70.88 | 71.05 | 70.76 | 71.55 | 71.72 | 71.56 |
Class | BiLSTM | BiLSTM + Att | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
Neutral | 74.74 | 80.76 | 77.63 | 76.01 | 80.21 | 78.05 |
Positive | 76.92 | 55.87 | 64.72 | 76.51 | 60.06 | 67.29 |
Negative | 60.38 | 60.00 | 60.19 | 62.48 | 63.17 | 62.83 |
Weighted avg | 71.08 | 71.01 | 70.71 | 72.31 | 72.25 | 72.09 |
Class | Hybrid | Hybrid + Att | ||||
---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | |
Neutral | 78.32 | 72.95 | 75.54 | 75.43 | 79.89 | 77.59 |
Positive | 65.63 | 59.22 | 62.26 | 71.15 | 60.61 | 65.46 |
Negative | 58.11 | 69.37 | 63.24 | 63.16 | 60.95 | 62.04 |
Weighted avg | 70.67 | 69.77 | 70.00 | 71.32 | 71.54 | 71.32 |
Class | DNN | 1D-CNN | BiLSTM | Hybrid | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
Neutral | 73.32 | 66.96 | 69.99 | 72.52 | 85.33 | 78.41 | 71.97 | 82.02 | 76.67 | 72.23 | 80.21 | 76.01 |
Positive | 58.19 | 48.60 | 52.97 | 69.49 | 57.26 | 62.79 | 75.32 | 48.60 | 59.08 | 73.17 | 50.28 | 59.60 |
Negative | 50.06 | 63.49 | 55.98 | 68.87 | 51.27 | 58.78 | 61.55 | 56.67 | 59.01 | 58.64 | 56.03 | 57.31 |
Weight avg | 64.42 | 63.08 | 63.38 | 71.02 | 71.37 | 70.45 | 69.59 | 69.64 | 68.95 | 68.58 | 68.71 | 68.18 |
Class | P | R | F1 |
---|---|---|---|
Neutral | 81.13 | 70.38 | 75.37 |
Positive | 64.35 | 79.87 | 71.27 |
Negative | 57.77 | 69.16 | 62.95 |
Weighted avg | 68.43 | 75.58 | 71.35 |
Class | SVM | NB | DT | RF | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
Neutral | 72.09 | 82.89 | 77.11 | 76.40 | 76.65 | 76.52 | 71.40 | 72.56 | 71.98 | 72.59 | 86.34 | 78.87 |
Positive | 71.22 | 55.31 | 62.26 | 75.99 | 60.55 | 67.40 | 54.47 | 55.23 | 54.85 | 77.19 | 60.75 | 67.99 |
Negative | 64.42 | 53.37 | 58.26 | 58.09 | 64.51 | 61.13 | 50.95 | 48.81 | 49.85 | 64.63 | 46.99 | 54.41 |
Weight avg | 69.81 | 70.21 | 69.49 | 71.34 | 70.80 | 70.89 | 63.16 | 63.36 | 63.25 | 71.14 | 71.58 | 70.49 |
Class | SVM | NB | DT | RF | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
Neutral | 72.15 | 87.85 | 79.23 | 70.02 | 92.71 | 79.78 | 70.82 | 75.56 | 73.11 | 71.81 | 90.15 | 79.94 |
Positive | 83.26 | 52.79 | 64.62 | 90.51 | 45.17 | 60.26 | 61.63 | 53.85 | 57.47 | 84.52 | 56.02 | 67.38 |
Negative | 67.84 | 52.22 | 59.01 | 72.86 | 44.60 | 55.33 | 52.68 | 49.15 | 50.85 | 70.40 | 46.53 | 56.03 |
Weight avg | 72.71 | 72.34 | 71.27 | 74.02 | 72.11 | 70.04 | 64.43 | 64.94 | 64.58 | 73.43 | 72.88 | 71.44 |
Model | Embeddings | F1-Score |
---|---|---|
DNN | Domain | 69.32 |
FastText | 63.38 | |
1D-CNN | Domain | 70.76 |
FastText | 70.45 | |
1D-CNN + Att | Domain | 71.56 |
BiLSTM | Domain | 70.71 |
FastText | 68.95 | |
BiLSTM + Att | Domain | 72.09 |
Hybrid | Domain | 70.00 |
FastText | 68.18 | |
Hybrid + Att | Domain | 71.32 |
BERT | mBERT | 71.35 |
SVM | 69.49 | |
71.27 | ||
NB | 70.89 | |
70.04 | ||
DT | 63.25 | |
64.58 | ||
RF | 70.49 | |
71.44 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kastrati, Z.; Ahmedi, L.; Kurti, A.; Kadriu, F.; Murtezaj, D.; Gashi, F. A Deep Learning Sentiment Analyser for Social Media Comments in Low-Resource Languages. Electronics 2021, 10, 1133. https://doi.org/10.3390/electronics10101133
Kastrati Z, Ahmedi L, Kurti A, Kadriu F, Murtezaj D, Gashi F. A Deep Learning Sentiment Analyser for Social Media Comments in Low-Resource Languages. Electronics. 2021; 10(10):1133. https://doi.org/10.3390/electronics10101133
Chicago/Turabian StyleKastrati, Zenun, Lule Ahmedi, Arianit Kurti, Fatbardh Kadriu, Doruntina Murtezaj, and Fatbardh Gashi. 2021. "A Deep Learning Sentiment Analyser for Social Media Comments in Low-Resource Languages" Electronics 10, no. 10: 1133. https://doi.org/10.3390/electronics10101133
APA StyleKastrati, Z., Ahmedi, L., Kurti, A., Kadriu, F., Murtezaj, D., & Gashi, F. (2021). A Deep Learning Sentiment Analyser for Social Media Comments in Low-Resource Languages. Electronics, 10(10), 1133. https://doi.org/10.3390/electronics10101133