[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3617023.3617039acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

Published: 23 October 2023 Publication History

Abstract

The lack of sufficient information, mainly in short texts, is a major challenge to building effective sentiment models. Short texts can be enriched with more complex semantic relationships that better capture affective information, with a potential undesired side effect of noise introduced into the data. This work proposes a new strategy for customized dataset-oriented sentiment analysis – CluSent – that exploits a powerful, recently proposed concept for representing semantically related words – CluWords. CluSent tackles the issues mentioned above of information shortage and noise by: (i) exploiting the semantic neighborhood of a given pre-trained word embedding to enrich document representation and (ii) introducing dataset-oriented filtering and weighting mechanisms to cope with noise, which takes advantage of the polarity and intensity information from lexicons. In our experimental evaluation, considering 19 datasets, five state-of-the-art baselines (including modern transformer architectures), and two metrics, CluSent was the best method in 30 out of 38 possibilities, with significant gains over the strongest baselines (over 14%).

References

[1]
Mohamad Alissa, Issa Haddad, Jonathan Meyer, Jade Obeid, Kostis Vilaetis, Nicolas Wiecek, and Sukrit Wongariyakavee. 2021. Sentiment Analysis for Open Domain Conversational Agent. arxiv:2101.00675 [cs.AI]
[2]
Washington Cunha, Vítor Mangaravite, Christian Gomes, Sérgio Canuto, Elaine Resende, Cecilia Nascimento, Felipe Viegas, Celso França, Wellington Santos Martins, Jussara M. Almeida, Thierson Rosa, Leonardo Rocha, and Marcos André Gonçalves. 2021. On the cost-effectiveness of neural and non-neural approaches and representations for text classification: A comprehensive comparative study. IP&M 58, 3 (2021), 102481. https://doi.org/10.1016/j.ipm.2020.102481
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805 (2018). https://arxiv.org/abs/1810.04805
[4]
Fábio Figueiredo, Leonardo Rocha, Thierson Couto, Thiago Salles, Marcos André Gonçalves, and Wagner Meira Jr.2011. Word Co-occurrence Features for Text Classification. Inf. Syst. 36 (2011). https://doi.org/10.1016/j.is.2011.02.002
[5]
Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. Processing 150 (01 2009).
[6]
Xia Hu, Nan Sun, Chao Zhang, and Tat-Seng Chua. 2009. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In Proceedings of CIKM. ACM, 919–928. https://doi.org/10.1145/1645953.1646071
[7]
Qi Huang, Zhanghao Chen, Zijie Lu, and Yuan Ye. 2018. Analysis of Bag-of-n-grams Representation’s Properties Based on Textual Reconstruction. CoRR (2018). arxiv:1809.06502http://arxiv.org/abs/1809.06502
[8]
Clayton J. Hutto and Eric Gilbert. 2014. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In ICWSM’14.
[9]
Zhigang Jin, Xiaofang Zhao, and Yuhong Liu. 2021. Heterogeneous Graph Network Embedding for Sentiment Analysis on Social Media. Cognitive Computation 13, 1 (01 Jan 2021), 81–95. https://doi.org/10.1007/s12559-020-09793-7
[10]
David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A New Benchmark Collection for Text Categorization Research. JMLR. 5 (2004), 361–397.
[11]
Alhassan Mabrouk, Rebeca P. Díaz Redondo, and Mohammed Kayed. 2020. Deep Learning-Based Sentiment Classification: A Comparative Survey. IEEE Access 8 (2020), 85616–85638. https://doi.org/10.1109/ACCESS.2020.2992013
[12]
Yu A Malkov and Dmitry A Yashunin. 2018. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE transactions on pattern analysis and machine intelligence 42, 4 (2018), 824–836.
[13]
Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Advances in Pre-Training Distributed Word Representations. In LREC’18.
[14]
Farhad Nooralahzadeh, Lilja Øvrelid, and Jan Tore Lønning. 2018. Evaluation of Domain-specific Word Embeddings using Knowledge Resources. In LREC’18, Nicoletta Calzolari (Conference chair), Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). ELRA, Miyazaki, Japan.
[15]
Filipe N Ribeiro, Matheus Araújo, Pollyanna Gonçalves, Marcos André Gonçalves, and Fabrício Benevenuto. 2016. SentiBench: A benchmark comparison of state-of-the-practice sentiment analysis methods. EPJ Data Science 5, 1 (2016), 1–29.
[16]
Sara Rosenthal, Noura Farra, and Preslav Nakov. 2019. SemEval-2017 Task 4: Sentiment Analysis in Twitter. CoRR abs/1912.00741 (2019). arxiv:1912.00741http://arxiv.org/abs/1912.00741
[17]
Devendra Singh Sachan, Manzil Zaheer, and Ruslan Salakhutdinov. 2019. Revisiting LSTM Networks for Semi-Supervised Text Classification via Mixed Objective Function. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 6940–6948. https://doi.org/10.1609/aaai.v33i01.33016940
[18]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In EMNLP’19. ACL, Seattle, Washington, USA, 1631–1642. https://www.aclweb.org/anthology/D13-1170
[19]
Tan Thongtan and Tanasanee Phienthrakul. 2019. Sentiment Classification Using Document Embeddings Trained with Cosine Similarity. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Florence, Italy, 407–414. https://doi.org/10.18653/v1/P19-2057
[20]
Felipe Viegas, Mário S. Alvim, Sérgio Canuto, Thierson Rosa, Marcos André Gonçalves, and Leonardo Rocha. 2020. Exploiting semantic relationships for unsupervised expansion of sentiment lexicons. Information Systems 94 (2020), 101606. https://doi.org/10.1016/j.is.2020.101606
[21]
Felipe Viegas, Sérgio Canuto, Christian Gomes, Washington Luiz, Thierson Rosa, Sabir Ribas, Leonardo Rocha, and Marcos André Gonçalves. 2019. CluWords: Exploiting Semantic Word Clustering Representation for Enhanced Topic Modeling. In Proceedings of WSDM ’19 (Melbourne VIC, Australia). 753–761. https://doi.org/10.1145/3289600.3291032
[22]
Felipe Viegas, Washington Cunha, Christian Gomes, Antônio Pereira, Leonardo Rocha, and Marcos Goncalves. 2020. CluHTM - Semantic Hierarchical Topic Modeling based on CluWords. In Proc. of the 58th Annual Meeting of the Assoc. for Computational Linguistics (ACL 2020). Association for Computational Linguistics, 8138–8150.
[23]
Yanyan Wang, Fulian Yin, Jianbo Liu, and Marco Tosato. 2020. Automatic construction of domain sentiment lexicon for semantic disambiguation. Multim. Tools Appl. 79, 31-32 (2020), 22355–22373. https://doi.org/10.1007/s11042-020-09030-1
[24]
Da Yin, Tao Meng, and Kai-Wei Chang. 2020. SentiBERT: A Transferable Transformer-Based Architecture for Compositional Sentiment Semantics. In Proceedings of the 58th Conference of the Association for Computational Linguistics, ACL 2020, Seattle, USA.

Cited By

View all
  • (2024)Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent MethodJournal on Interactive Systems10.5753/jis.2024.411715:1(561-575)Online publication date: 11-Jun-2024
  • (2024)Web Semantic-Enhanced Multimodal Sentiment Analysis Using Multilayer Cross-Attention FusionInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.36065320:1(1-29)Online publication date: 9-Nov-2024

Index Terms

  1. CluSent – Combining Semantic Expansion and De-Noising for Dataset-Oriented Sentiment Analysis of Short Texts

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WebMedia '23: Proceedings of the 29th Brazilian Symposium on Multimedia and the Web
    October 2023
    285 pages
    ISBN:9798400709081
    DOI:10.1145/3617023
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Classification
    2. Natural Language Processing
    3. Sentiment Analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • CNPq
    • Fapemig
    • Google Research Awards
    • CAPES
    • AWS

    Conference

    WebMedia '23
    WebMedia '23: Brazilian Symposium on Multimedia and the Web
    October 23 - 27, 2023
    Ribeirão Preto, Brazil

    Acceptance Rates

    Overall Acceptance Rate 270 of 873 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)26
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 25 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent MethodJournal on Interactive Systems10.5753/jis.2024.411715:1(561-575)Online publication date: 11-Jun-2024
    • (2024)Web Semantic-Enhanced Multimodal Sentiment Analysis Using Multilayer Cross-Attention FusionInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.36065320:1(1-29)Online publication date: 9-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media