[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3287921.3287976acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

An Entailment-based Scoring Method for Content Selection in Document Summarization

Published: 06 December 2018 Publication History

Abstract

This paper introduces a scoring method to improve the quality of content selection in an extractive summarization system. Different from previous models mainly using local information inside sentences such as sentence position or sentence length, our method judges the importance of a sentence based on its own information and the relation between sentences. For the relation between sentences, we utilize textual entailment, a relationship indicating that the meaning of a sentence can be inferred from another one. Unlike previous work on using textual entailment for summarization, we go a step further by looking at aligned words in an entailment sentence pair. Assuming that important words in a salient sentence can be aligned by several words in other sentences, word alignment scores are exploited to compute the entailment score of a sentence. To take advantage of local and neighbor information for facilitating the salient estimation of sentences, we combine entailment scores with sentence position scores. We validate the proposed scoring method with greedy or integer linear programming approaches for extracting summaries. Experiments on three datasets (including DUC 2001 and 2002) in two different domains show that our model obtains competitive ROUGE-scores with state-of-the-art methods for single-document summarization.

References

[1]
Kathleen McKeown Ani Nenkova. 2011. Automatic Summarization. Foundations and Trends in Information Retrieval, 5(2-3): 103--233 (2011).
[2]
Siddhartha Banerjee, Prasenjit Mitra, and Kazunari Sugiyama. 2015. Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression. In Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 1208--1214.
[3]
Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, and Rebecca J. Passonneau. 2015. Abstractive Multi-Document Summarization via Phrase Selection and Merging. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1587--1597, Association for Computational Linguistics.
[4]
Ziqiang Cao, Furu Wei, Li Dong, Sujian Li, and Ming Zhou. 2015. Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2153--2159.
[5]
Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, and Houfeng Wang. 2015. Learning Summary Prior Representation for Extractive Summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics ACL (2), pp. 829--833. Association for Computational Linguistics.
[6]
Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335--336. ACM.
[7]
Harold P. Edmundson. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 16(2), pp. 264--285 (1969).
[8]
Rafael Munoz Elena Lloret, Oscar Ferrandez and Manuel Palomar. 2008. A Text Summarization Approach under the Influence of Textual Entailment. In NLPCS, pp.22--31.
[9]
Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, pp. 457--479 (2004).
[10]
Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D.C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications, 40(2013), pp. 5755--5764. Elsevier (2013).
[11]
Daniel Gillick, Beno Favre, Dilek Hakkani-Tur, Bernd Bohnet, Yang Liu, and Shasha Xie. 2009. The ICSI/UTD Summarization System at TAC 2009. In Proceedings of Text Analysis Conference (TAC), NIST.
[12]
Stefan Henç, Margot Mieskes, and Iryna Gurevych. 2017. A Reinforcement Learning Approach for Adaptive Single- and Multi-Document Summarization. In Proceedings of the Int. Conference of the German Society for Computational Linguistics and Language Technology (GSCL), pp. 3--12.
[13]
Kai Hong and Ani Nenkova. 2014. Improving the Estimation of Word Importance for News Multi-Document Summarization. In The 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 712--721, Association for Computational Linguistics.
[14]
Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A Trainable Document Summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68--73. ACM.
[15]
Gyoung Ho Lee and Kong Joo Lee. 2017. Automatic Text Summarization Using Reinforcement Learning with Embedding Features. In Proceedings of the The 8th International Joint Conference on Natural Language Processing (IJCNLP), pp. 193--197.
[16]
Ju-Hong Lee, Sun Park, Chan-Min Ahn, and Daeho Kim. 2009. Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management 45(1), pp. 20--34 (2009).
[17]
Chen Li, Yang Liu, and Lin Zhao. 2015. Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization. In HLT-NAACL: 778--787.
[18]
Chen Li, Zhongyu Wei, Yang Liu, Yang Jin, and Fei Huang. 2016. Using Relevant Public Posts to Enhance News Article Summarization. In COLING, pp. 557--566.
[19]
Sujian Li, You Ouyang, Wei Wang, and Bin Sun. 2007. Multi-document summarization using support vector regression. In Proceedings of Document Understanding Conference (DUC).
[20]
Chin-Yew Lin and Eduard H. Hovy. 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 71--78. Association for Computational Linguistics.
[21]
Hui Lin and Jeff A. Bilmes. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 510--520. Association for Computational Linguistics.
[22]
Hans P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research Development, 2(2), pp. 159--165 (1958).
[23]
Ryan McDonald. 2007. A study of global inference algorithms in multi-document summarization. In Proceedings of the European Conference on Information Retrieval (ECIR), pp. 557--564.
[24]
Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 404--411. Association for Computational Linguistics.
[25]
Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 3075--3081.
[26]
Ani Nenkova. 2005. Automatic text summarization of newswire: lessons learned from the document understanding conference. In AAAI, vol. 5, pp. 1436--1441.
[27]
Minh-Tien Nguyen, Tran Viet Cuong, Nguyen Xuan Hoai, and Minh-Le Nguyen. 2017. Utilizing User Posts to Enrich Web Document Summarization with Matrix Co-factorization. In Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT), pp. 70--77. ACM.
[28]
Minh-Tien Nguyen, Duc-Vu Tran, and Minh-Le Nguyen. 2018. Exploiting User Posts for Web Document Summarization. Transactions on Knowledge Discovery from Data (TKDD), 12(4), 49. ACM (2018).
[29]
Miles Osborne. 2002. Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization-Volume 4, pp. 1--8. Association for Computational Linguistics.
[30]
Ankur P. Parikh, Oscar Tackstrom, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. In arXiv preprint arXiv:1606.01933.
[31]
Sun Park, Ju-Hong Lee, Chan-Min Ahn, Jun Sik Hong, and Seok-Ju Chun. 2006. Query Based Summarization Using Non-negative Matrix Factorization. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 84--89. Springer Berlin/Heidelberg.
[32]
Ramakanth Pasunuru and Mohit Bansal. 2018. Multi-Reward Reinforced Summarization with Saliency and Entailment. In arXiv preprint arXiv:1804.06451.
[33]
Pengjie Ren, Zhumin Chen, Zhaochun Ren, Furu Wei, Jun Ma, and Maarten de Rijke. 2017. Leveraging Contextual Sentence Relations for Extractive Summarization Using a Neural A ention Model. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 95--104. ACM.
[34]
Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen. 2007. Document Summarization Using Conditional Random Fields. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), vol. 7, pp. 2862--2867.
[35]
Doina Tatar, Emma Tamaianu-Morita, Andreea Mihis, and Dana Lupsa. 2008. Summarization by logic segmentation and text entailment. In Advances in Natural Language Processing and Applications (CICLING), pp. 15--26.
[36]
Dingding Wang, Tao Li, Shenghuo Zhu, and Chris Ding. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307--314. ACM.
[37]
W.M. Wang, Z. Li, J.W. Wang, and Z.H. Zheng. 2017. How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds. Expert Systems with Applications, 90(2017), pp. 439--463. Elsevier (2017).
[38]
Kristian Woodsend and Mirella Lapata. 2010. Automatic Generation of Story Highlights. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 565--574. Association for Computational Linguistics.
[39]
Kristian Woodsend and Mirella Lapata. 2012. Multiple Aspect Summarization Using Integer Linear Programming. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 233--243. Association for Computational Linguistics.

Cited By

View all
  • (2024)MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization SystemsProceedings of the ACM on Software Engineering10.1145/36608201:FSE(2561-2583)Online publication date: 12-Jul-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
SoICT '18: Proceedings of the 9th International Symposium on Information and Communication Technology
December 2018
496 pages
ISBN:9781450365390
DOI:10.1145/3287921
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • SOICT: School of Information and Communication Technology - HUST
  • NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Entailment
  2. Integer Linear Programming (ILP)
  3. Sentence Scoring
  4. Web Document Summarization

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoICT 2018

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization SystemsProceedings of the ACM on Software Engineering10.1145/36608201:FSE(2561-2583)Online publication date: 12-Jul-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media