[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Quality of sentiment analysis tools: the reasons of inconsistency

Published: 01 December 2020 Publication History

Abstract

In this paper, we present a comprehensive study that evaluates six state-of-the-art sentiment analysis tools on five public datasets, based on the quality of predictive results in the presence of semantically equivalent documents, i.e., how consistent existing tools are in predicting the polarity of documents based on paraphrased text. We observe that sentiment analysis tools exhibit intra-tool inconsistency, which is the prediction of different polarity for semantically equivalent documents by the same tool, and inter-tool inconsistency, which is the prediction of different polarity for semantically equivalent documents across different tools. We introduce a heuristic to assess the data quality of an augmented dataset and a new set of metrics to evaluate tool inconsistencies. Our results indicate that tool inconsistencies is still an open problem, and they point towards promising research directions and accuracy improvements that can be obtained if such inconsistencies are resolved.

References

[1]
M. Alzantot, Y. Sharma, A. Elgohary, B.-J. Ho, M. Srivastava, and K.-W. Chang. Generating natural language adversarial examples. arXiv preprint arXiv:1804.07998, 2018.
[2]
S. Amer-Yahia, T. Palpanas, M. Tsytsarau, S. Kleisarchaki, A. Douzal, and V. Christophides. Temporal analytics in social media. In Encyclopedia of Database Systems, Second Edition. Springer, 2018.
[3]
M. Balduini, E. D. Valle, D. Dell'Aglio, M. Tsytsarau, T. Palpanas, and C. Confalonieri. Social listening of city scale events using the streaming linked data framework. In ISWC, 2013.
[4]
M. Bautin, L. Vijayarenu, and S. Skiena. International sentiment analysis for news and blogs. In ICWSM, 2008.
[5]
S. Benbernou and M. Ouziri. Enhancing data quality by cleaning inconsistent big RDF data. In 2017 IEEE International Conference on Big Data, BigData 2017, Boston, MA, USA, December 11-14, 2017, pages 74--79, 2017.
[6]
L. E. Bertossi. Inconsistent databases. In Encyclopedia of Database Systems, Second Edition. Springer, 2018.
[7]
J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In Proceedings of the 45th annual meeting of the association of computational linguistics, pages 440--447, 2007.
[8]
E. Cambria, S. Poria, D. Hazarika, and K. Kwok. Senticnet 5: discovering conceptual primitives for sentiment analysis by means of context embeddings. In Proceedings of AAAI, 2018.
[9]
N. Carlini and D. Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 3--14. ACM, 2017.
[10]
B. Chen and X. Zhu. Bilingual sentiment consistency for statistical machine translation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 607--615, 2014.
[11]
T. Chen, R. Xu, Y. He, and X. Wang. Improving sentiment analysis via sentence type classification using bilstm-crf and cnn. Expert Systems with Applications, 72:221--230, 2017.
[12]
Y. Choi and J. Wiebe. +/-effectwordnet: Sense-level lexicon acquisition for opinion inference. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1181--1191, 2014.
[13]
Y. Choi, J. Wiebe, and R. Mihalcea. Coarse-grained+/-effect word sense disambiguation for implicit sentiment analysis. IEEE Transactions on Affective Computing, 8(4):471--479, 2017.
[14]
K. Cortis, A. Freitas, T. Daudert, M. Huerlimann, M. Zarrouk, S. Handschuh, and B. Davis. Semeval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 519--535, 2017.
[15]
N. N. Dalvi, A. Machanavajjhala, and B. Pang. An analysis of structured data on the web. Proc. VLDB Endow., 5(7):680--691, 2012.
[16]
G. Demartini and S. Siersdorfer. Dear search engine: what's your opinion about…?: sentiment analysis for semantic enrichment of web search results. In Proceedings of the 3rd International Semantic Search Workshop, page 4. ACM, 2010.
[17]
K. Denecke. Using sentiwordnet for multilingual sentiment analysis. In 2008 IEEE 24th International Conference on Data Engineering Workshop, pages 507--512. IEEE, 2008.
[18]
H. Ding and E. Riloff. Acquiring knowledge of affective events from blogs using label propagation. In Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[19]
H. Ding and E. Riloff. Weakly supervised induction of affective events by optimizing semantic consistency. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[20]
X. L. Dong and D. Srivastava. Entity resolution. In Encyclopedia of Database Systems, Second Edition. Springer, 2018.
[21]
C. Dos Santos and M. Gatti. Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 69--78, 2014.
[22]
M. Dragoni and G. Petrucci. A fuzzy-based strategy for multi-domain sentiment analysis. International Journal of Approximate Reasoning, 93:59--73, 2018.
[23]
E. C. Dragut, H. Wang, P. Sistla, C. Yu, and W. Meng. Polarity consistency checking for domain independent sentiment dictionaries. IEEE Transactions on knowledge and data engineering, 27(3):838--851, 2015.
[24]
A. Drutsa, V. Fedorova, D. Ustalov, O. Megorskaya, E. Zerminova, and D. Baidakova. Crowdsourcing practice for efficient data labeling: Aggregation, incremental relabeling, and pricing. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 2623--2627, 2020.
[25]
A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In LREC, volume 6, pages 417--422. Citeseer, 2006.
[26]
R. Feldman. Techniques and applications for sentiment analysis. Communications of the ACM, 56(4):82--89, 2013.
[27]
X. Feng, Y. Zeng, and Y. Xu. Recommendation algorithm for federated user reviews and item reviews. In Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality, pages 97--103. ACM, 2018.
[28]
G. Fu, Y. He, J. Song, and C. Wang. Improving chinese sentence polarity classification via opinion paraphrasing. In Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing, pages 35--42, 2014.
[29]
C. H. E. Gilbert. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International Conference on Weblogs and Social Media (ICWSM-14), 2014.
[30]
A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. CS224N project report, Stanford, 1(12):2009, 2009.
[31]
S. Greene and P. Resnik. More than words: Syntactic packaging and implicit sentiment. In Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics, pages 503--511. Association for Computational Linguistics, 2009.
[32]
H. Hamdan, F. Béchet, and P. Bellot. Experiments with dbpedia, wordnet and sentiwordnet as resources for sentiment analysis in micro-blogging. In Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 455--459, 2013.
[33]
R. He and J. McAuley. Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. In proceedings of the 25th international conference on world wide web, pages 507--517. International World Wide Web Conferences Steering Committee, 2016.
[34]
M. Iyyer, V. Manjunatha, J. Boyd-Graber, and H. Daumé III. Deep unordered composition rivals syntactic methods for text classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 1681--1691, 2015.
[35]
M. Iyyer, J. Wieting, K. Gimpel, and L. Zettlemoyer. Adversarial example generation with syntactically controlled paraphrase networks. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1875--1885, 2018.
[36]
X. Ji. Social data integration and analytics for health intelligence. In Proceedings VLDB PhD Workshop, 2014.
[37]
Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
[38]
F. Kokkinos and A. Potamianos. Structural attention neural networks for improved sentiment analysis. arXiv preprint arXiv:1701.01811, 2017.
[39]
E. Kouloumpis, T. Wilson, and J. D. Moore. Twitter sentiment analysis: The good the bad and the omg! Icwsm, 11(538-541):164, 2011.
[40]
E. Krivosheev, S. Bykau, F. Casati, and S. Prabhakar. Detecting and preventing confused labels in crowdsourced data. Proceedings of the VLDB Endowment, 13(12):2522--2535, 2020.
[41]
F. M. Kundi, S. Ahmad, A. Khan, and M. Z. Asghar. Detection and scoring of internet slangs for sentiment analysis using sentiwordnet. Life Science Journal, 11(9):66--72, 2014.
[42]
M. Kusner, Y. Sun, N. Kolkin, and K. Weinberger. From word embeddings to document distances. In International conference on machine learning, pages 957--966, 2015.
[43]
S. Lai, L. Xu, K. Liu, and J. Zhao. Recurrent convolutional neural networks for text classification. In Twenty-ninth AAAI conference on artificial intelligence, 2015.
[44]
B. Liang, H. Li, M. Su, P. Bian, X. Li, and W. Shi. Deep text classification can be fooled. arXiv preprint arXiv:1704.08006, 2017.
[45]
B. Liu and L. Zhang. A survey of opinion mining and sentiment analysis. In Mining text data, pages 415--463. Springer, 2012.
[46]
L. Luo, X. Ao, F. Pan, J. Wang, T. Zhao, N. Yu, and Q. He. Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In IJCAI, pages 4244--4250, 2018.
[47]
A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1, pages 142--150. Association for Computational Linguistics, 2011.
[48]
T. Mahler, W. Cheung, M. Elsner, D. King, M.-C. de Marneffe, C. Shain, S. Stevens-Guille, and M. White. Breaking nlp: Using morphosyntax, semantics, pragmatics and world knowledge to fool sentiment analysis systems. In Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pages 33--39, 2017.
[49]
A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Tweets as data: demonstration of tweeql and twitinfo. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 1259--1262, 2011.
[50]
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013.
[51]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.
[52]
G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39--41, 1995.
[53]
T. Miyato, A. M. Dai, and I. Goodfellow. Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725, 2016.
[54]
B. Ohana and B. Tierney. Sentiment classification of reviews using sentiwordnet. In 9th. it & t conference, volume 13, pages 18--30, 2009.
[55]
J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532--1543, 2014.
[56]
A.-M. Popescu and M. Pennacchiotti. Detecting controversial events from twitter. In Proceedings of the 19th ACM international conference on Information and knowledge management, pages 1873--1876, 2010.
[57]
S. Poria, E. Cambria, and A. Gelbukh. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 2539--2544, 2015.
[58]
N. Prokoshyna, J. Szlichta, F. Chiang, R. J. Miller, and D. Srivastava. Combining quantitative and logical data cleaning. Proc. VLDB Endow., 9(4):300--311, 2015.
[59]
C. Quirk, C. Brockett, and W. B. Dolan. Monolingual machine translation for paraphrase generation. In Proceedings of the 2004 conference on empirical methods in natural language processing, pages 142--149, 2004.
[60]
A. Ratner, S. H. Bach, H. Ehrenberg, J. Fries, S. Wu, and C. Ré. Snorkel: Rapid training data creation with weak supervision. The VLDB Journal, 29(2):709--730, 2020.
[61]
M. T. Ribeiro, S. Singh, and C. Guestrin. Semantically equivalent adversarial rules for debugging nlp models. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 856--865, 2018.
[62]
J. Risch and R. Krestel. Aggression identification using deep learning and data augmentation. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), pages 150--158, 2018.
[63]
H. Rong, V. S. Sheng, T. Ma, Y. Zhou, and M. A. Al-Rodhaan. A self-play and sentiment-emphasized comment integration framework based on deep q-learning in a crowdsourcing scenario. IEEE Transactions on Knowledge and Data Engineering, 2020.
[64]
K. Schouten and F. Frasincar. Survey on aspect-level sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, 28(3):813--830, 2015.
[65]
A. Severyn and A. Moschitti. Twitter sentiment analysis with deep convolutional neural networks. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 959--962. ACM, 2015.
[66]
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D. Manning, A. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631--1642, 2013.
[67]
D. Tang, F. Wei, B. Qin, N. Yang, T. Liu, and M. Zhou. Sentiment embeddings with applications to sentiment analysis. IEEE transactions on knowledge and data Engineering, 28(2):496--509, 2015.
[68]
M. Tsytsarau, S. Amer-Yahia, and T. Palpanas. Efficient sentiment correlation for large-scale demographics. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 253--264, 2013.
[69]
M. Tsytsarau and T. Palpanas. Survey on mining subjective data on the web. Data Min. Knowl. Discov., 24(3):478--514, 2012.
[70]
M. Tsytsarau and T. Palpanas. Managing diverse sentiments at large scale. IEEE Transactions on Knowledge and Data Engineering, 28(11):3028--3040, 2016.
[71]
M. Tsytsarau, T. Palpanas, and M. Castellanos. Dynamics of news events and social media reaction. In KDD, 2014.
[72]
S. Vosoughi, P. Vijayaraghavan, and D. Roy. Tweet2vec: Learning tweet embeddings using character-level cnn-lstm encoder-decoder. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 1041--1044. ACM, 2016.
[73]
K. Wang and X. Wan. Sentigan: Generating sentimental texts via mixture adversarial networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pages 4446--4452, 2018.
[74]
W. Wang, J. Gao, M. Zhang, S. Wang, G. Chen, T. K. Ng, B. C. Ooi, J. Shao, and M. Reyad. Rafiki: Machine learning as an analytics service system. Proc. VLDB Endow., 12(2):128--140, 2018.
[75]
X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In Proceedings of the 20th ACM international conference on Information and knowledge management, pages 1031--1040, 2011.
[76]
Y. Wang, A. Sun, J. Han, Y. Liu, and X. Zhu. Sentiment analysis by capsules. In Proceedings of the 2018 World Wide Web Conference on World Wide Web, pages 1165--1174. International World Wide Web Conferences Steering Committee, 2018.
[77]
J. W. Wei and K. Zou. Eda: Easy data augmentation techniques for boosting performance on text classification tasks. arXiv preprint arXiv:1901.11196, 2019.
[78]
J. Wiebe, T. Wilson, and C. Cardie. Annotating expressions of opinions and emotions in language. Language resources and evaluation, 39(2-3):165--210, 2005.
[79]
B. Yang and C. Cardie. Context-aware learning for sentence-level sentiment analysis with posterior regularization. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 325--335, 2014.
[80]
K. Zhao, Y. Liu, Q. Yuan, L. Chen, Z. Chen, and G. Cong. Towards personalized maps: Mining user preferences from geo-textual data. Proc. VLDB Endow., 9(13):1545--1548, 2016.
[81]
Y. Zheng, G. Li, Y. Li, C. Shan, and R. Cheng. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment, 10(5):541--552, 2017.
[82]
L. Zhu, A. Galstyan, J. Cheng, and K. Lerman. Tripartite graph clustering for dynamic sentiment analysis on social media. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1531--1542, 2014.

Cited By

View all
  • (2022)SA-QProceedings of the VLDB Endowment10.14778/3554821.355486815:12(3658-3661)Online publication date: 29-Sep-2022
  • (2022)WSSA: Weakly Supervised Semantic-based approach for Sentiment AnalysisProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538747(1-4)Online publication date: 6-Jul-2022
  • (2022)News-based business sentiment and its properties as an economic indexInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10279559:2Online publication date: 1-Mar-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 4
December 2020
263 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 December 2020
Published in PVLDB Volume 14, Issue 4

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)SA-QProceedings of the VLDB Endowment10.14778/3554821.355486815:12(3658-3661)Online publication date: 29-Sep-2022
  • (2022)WSSA: Weakly Supervised Semantic-based approach for Sentiment AnalysisProceedings of the 34th International Conference on Scientific and Statistical Database Management10.1145/3538712.3538747(1-4)Online publication date: 6-Jul-2022
  • (2022)News-based business sentiment and its properties as an economic indexInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10279559:2Online publication date: 1-Mar-2022
  • (2022)Introducing the contrast profile: a novel time series primitive that allows real world classificationData Mining and Knowledge Discovery10.1007/s10618-022-00824-536:2(877-915)Online publication date: 1-Mar-2022

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media