[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3488560.3498463acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Open access

ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic

Published: 15 February 2022 Publication History

Abstract

Unsupervised opinion summarization techniques are designed to condense the review data and summarize informative and salient opinions in the absence of golden references. Existing dominant methods generally follow a two-stage framework: first creating the synthetic "review-summary" paired datasets and then feeding them into the generative summary model for supervised training. However, these methods mainly focus on semantic similarity in synthetic dataset creation, ignoring the consistency of aspects and sentiments in synthetic pairs. Such inconsistency also brings a gap to the training and inference of the summarization model.
To alleviate this problem, we propose ConsistSum, an unsupervised opinion summarization method devoting to capture the consistency of aspects and sentiment between reviews and summaries. Specifically, ConsistSum first extracts the preliminary "review-summary" pairs from the raw corpus by evaluating the distance of aspect distribution and sentiment distribution. Then, we refine the preliminary summary with the constrained Metropolis-Hastings sampling to produce highly consistent synthetic datasets. In the summarization phase, we adopt the generative model T5 as the summarization model. T5 is fine-tuned for the opinion summarization task by incorporating the loss of predicting aspect and opinion distribution. Experimental results on two benchmark datasets, $i.e.$, Yelp and Amazon, demonstrate the superior performance of ConsistSum over the state-of-the-art baselines.

Supplementary Material

MP4 File (WSDM22-fp485.mp4)
We propose ConsistSum, an unsupervised opinion summarization method devoting to capture the consistency of aspects and sentiment between reviews and summaries. ConsistSum first extracts the preliminary ''review-summary'' pairs from the raw corpus. Then, we refine the preliminary summary with the constrained Metropolis-Hastings sampling to produce highly consistent synthetic datasets. In the summarization phase, T5 is fine-tuned for the opinion summarization task by incorporating the loss of predicting aspect and opinion distribution. Experimental results demonstrate the superior performance of ConsistSum over the state-of-the-art baselines.

References

[1]
Reinald Kim Amplayo, Stefanos Angelidis, and Mirella Lapata. 2021. Unsupervised Opinion Summarization with Content Planning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 12489--12497.
[2]
Reinald Kim Amplayo and Mirella Lapata. 2020. Unsupervised Opinion Summarization with Noising and Denoising. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), 1934--1945.
[3]
Reinald Kim Amplayo and Mirella Lapata. 2021. Informative and Controllable Opinion Summarization. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (2021), 2662--2672.
[4]
Stefanos Angelidis and Mirella Lapata. 2018. Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (2018), 3675--3686.
[5]
Steven Bird. 2006. NLTK: the natural language toolkit. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions. 69--72.
[6]
Arthur Bravz inskas, Mirella Lapata, and Ivan Titov. 2020 a. Few-Shot Learning for Opinion Summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . 4119--4135.
[7]
Arthur Bravz inskas, Mirella Lapata, and Ivan Titov. 2020 b. Unsupervised Opinion Summarization as Copycat-Review Generation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), 5151--5169.
[8]
Alvin Chan, Yew-Soon Ong, Bill Pung, Aston Zhang, and Jie Fu. 2020. CoCon: A Self-Supervised Approach for Controlled Text Generation. In International Conference on Learning Representations .
[9]
Eric Chu and Peter Liu. 2019. Meansum: a neural model for unsupervised multi-document abstractive summarization. In International Conference on Machine Learning . PMLR, 1223--1232.
[10]
Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2019. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. International Conference on Learning Representations (2019).
[11]
Günes Erkan and Dragomir R Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of artificial intelligence research, Vol. 22 (2004), 457--479.
[12]
Xianghua Fu, Yanzhi Wei, Fan Xu, Ting Wang, Yu Lu, Jianqiang Li, and Joshua Zhexue Huang. 2019. Semi-supervised aspect-level sentiment classification model based on variational autoencoder. Knowledge-Based Systems, Vol. 171 (2019), 81--92.
[13]
Kavita Ganesan, ChengXiang Zhai, and Jiawei Han. 2010. Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). 340--348.
[14]
René Arnulfo Garc'ia-Hernández, Romyna Montiel, Yulia Ledeneva, Eréndira Rendón, Alexander Gelbukh, and Rafael Cruz. 2008. Text summarization by sentence extraction using unsupervised learning. In Mexican International Conference on Artificial Intelligence. Springer, 133--143.
[15]
John Giorgi, Osvald Nitski, Bo Wang, and Gary Bader. 2021. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) . 879--895.
[16]
Lu He, Tingjue Yin, Zhaoxian Hu, Yunan Chen, David A Hanauer, and Kai Zheng. 2021. Developing a standardized protocol for computational sentiment analysis research using health-related social media data. Journal of the American Medical Informatics Association, Vol. 28, 6 (2021), 1125--1134.
[17]
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel Dahlmeier. 2017. An unsupervised neural attention model for aspect extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 388--397.
[18]
Minqing Hu and Bing Liu. 2004. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining . 168--177.
[19]
Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of NAACL-HLT (2019), 4171--4186.
[20]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP) . 1746--1751.
[21]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. Computer Science (2014).
[22]
Daniel Lee, Rakesh Verma, Avisha Das, and Arjun Mukherjee. 2020. Experiments in Extractive Summarization: Integer Linear Programming, Term/Sentence Scoring, and Title-driven Models. arXiv e-prints (2020), arXiv--2008.
[23]
Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, and Lei Li. 2020. On the Sentence Embeddings from Pre-trained Language Models. In Conference on Empirical Methods in Natural Language Processing (EMNLP) .
[24]
Qintong Li, Piji Li, Xinyi Li, Zhaochun Ren, Zhumin Chen, and Maarten de Rijke. 2021. Abstractive Opinion Tagging. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 337--345.
[25]
Yingjie Li and Cornelia Caragea. 2019. Multi-task stance detection with sentiment and stance lexicons. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 6299--6305.
[26]
Chin-Yew Lin and Eduard Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. 150--157.
[27]
Eric Malmi, Sebastian Krause, Sascha Rothe, Daniil Mirylenka, and Aliaksei Severyn. 2019. Encode, Tag, Realize: High-Precision Text Editing. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) . 5054--5065.
[28]
Yuning Mao, Wenchang Ma, Deren Lei, and Xiang Ren. 2021. Extract, Denoise, and Enforce: Evaluating and Predicting Lexical Constraints for Conditional Text Generation. arXiv e-prints (2021), arXiv--2104.
[29]
André FT Martins and Ramón F Astudillo. 2016. From softmax to sparsemax: a sparse model of attention and multi-label classification. In Proceedings of the 33rd International Conference on International Conference on Machine Learning-Volume 48 . 1614--1623.
[30]
Ning Miao, Hao Zhou, Lili Mou, Rui Yan, and Lei Li. 2019. Cgmh: Constrained sentence generation by metropolis-hastings sampling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 6834--6842.
[31]
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence .
[32]
Matteo Pagliardini, Prakhar Gupta, and Martin Jaggi. 2018. Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 528--540.
[33]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) . 1532--1543.
[34]
Jiangtao Qiu, Chuanhui Liu, Yinghong Li, and Zhangxi Lin. 2018. Leveraging sentiment analysis at the aspects level to predict ratings of reviews. Information Sciences, Vol. 451 (2018), 295--309.
[35]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, Vol. 21 (2020), 1--67.
[36]
Gaetano Rossiello, Pierpaolo Basile, and Giovanni Semeraro. 2017. Centroid-based text summarization through compositionality of word embeddings. In Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres . 12--21.
[37]
Vivek Kumar Rangarajan Sridhar. 2015. Unsupervised topic modeling for short texts using distributed representations of words. In Proceedings of the 1st workshop on vector space modeling for natural language processing. 192--200.
[38]
Jianlin Su. 2021 a. Make sentences by adding, deleting and replacing words. https://spaces.ac.cn/archives/8194
[39]
Jianlin Su. 2021 b. SPACES: long document summarization with extraction and abstraction operations. https://spaces.ac.cn/archives/8046
[40]
Jianlin Su, Jiarun Cao, Weijie Liu, and Yangyiwen Ou. 2021. Whitening Sentence Representations for Better Semantics and Faster Retrieval. arXiv e-prints (2021), arXiv--2103.
[41]
Yoshihiko Suhara, Xiaolan Wang, Stefanos Angelidis, and Wang-Chiew Tan. 2020. OpinionDigest: A Simple Framework for Opinion Summarization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), 5789--5798.
[42]
Wenyi Tay. 2019. Not All Reviews Are Equal: Towards Addressing Reviewer Biases for Opinion Summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop . 34--42.
[43]
Ke Wang and Xiaojun Wan. 2021. TransSum: Translating Aspect and Sentiment Embeddings for Self-Supervised Opinion Summarization. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 729--742.
[44]
Yumo Xu and Mirella Lapata. 2020. Coarse-to-fine query focused multi-document summarization. In Proceedings of the 2020 Conference on empirical methods in natural language processing (EMNLP) . 3632--3645.
[45]
Yizhe Zhang, Guoyin Wang, Chunyuan Li, Zhe Gan, Chris Brockett, and William B Dolan. 2020. Pointer: Constrained Text Generation via Insertion-based Generative Pre-training. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020), 8649--8670.
[46]
Chao Zhao and Snigdha Chaturvedi. 2020. Weakly-supervised opinion summarization by leveraging external information. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9644--9651.

Cited By

View all
  • (2023)AsU-OSumInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10313860:1Online publication date: 1-Jan-2023
  • (2022)Considering Commonsense in Solving QA: Reading Comprehension with Semantic Search and Continual LearningApplied Sciences10.3390/app1209409912:9(4099)Online publication date: 19-Apr-2022
  • (2022)Beyond Opinion MiningProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532676(3447-3450)Online publication date: 7-Jul-2022

Index Terms

  1. ConsistSum: Unsupervised Opinion Summarization with the Consistency of Aspect, Sentiment and Semantic

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '22: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining
    February 2022
    1690 pages
    ISBN:9781450391320
    DOI:10.1145/3488560
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. consistency enhancement
    2. opinion summarization
    3. unsupervised method

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    WSDM '22

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)237
    • Downloads (Last 6 weeks)38
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)AsU-OSumInformation Processing and Management: an International Journal10.1016/j.ipm.2022.10313860:1Online publication date: 1-Jan-2023
    • (2022)Considering Commonsense in Solving QA: Reading Comprehension with Semantic Search and Continual LearningApplied Sciences10.3390/app1209409912:9(4099)Online publication date: 19-Apr-2022
    • (2022)Beyond Opinion MiningProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532676(3447-3450)Online publication date: 7-Jul-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media