More Web Proxy on the site http://driver.im/

research-article

An Entailment-based Scoring Method for Content Selection in Document Summarization

Authors:

Dang Hoang Long,

Minh-Tien Nguyen,

Le-Minh Nguyen,

Tu Minh PhuongAuthors Info & Claims

SoICT '18: Proceedings of the 9th International Symposium on Information and Communication Technology

Pages 122 - 129

https://doi.org/10.1145/3287921.3287976

Published: 06 December 2018 Publication History

Abstract

This paper introduces a scoring method to improve the quality of content selection in an extractive summarization system. Different from previous models mainly using local information inside sentences such as sentence position or sentence length, our method judges the importance of a sentence based on its own information and the relation between sentences. For the relation between sentences, we utilize textual entailment, a relationship indicating that the meaning of a sentence can be inferred from another one. Unlike previous work on using textual entailment for summarization, we go a step further by looking at aligned words in an entailment sentence pair. Assuming that important words in a salient sentence can be aligned by several words in other sentences, word alignment scores are exploited to compute the entailment score of a sentence. To take advantage of local and neighbor information for facilitating the salient estimation of sentences, we combine entailment scores with sentence position scores. We validate the proposed scoring method with greedy or integer linear programming approaches for extracting summaries. Experiments on three datasets (including DUC 2001 and 2002) in two different domains show that our model obtains competitive ROUGE-scores with state-of-the-art methods for single-document summarization.

References

[1]

Kathleen McKeown Ani Nenkova. 2011. Automatic Summarization. Foundations and Trends in Information Retrieval, 5(2-3): 103--233 (2011).

[2]

Siddhartha Banerjee, Prasenjit Mitra, and Kazunari Sugiyama. 2015. Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression. In Twenty-Fourth International Joint Conference on Artificial Intelligence, pp. 1208--1214.

Digital Library

[3]

Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, and Rebecca J. Passonneau. 2015. Abstractive Multi-Document Summarization via Phrase Selection and Merging. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1587--1597, Association for Computational Linguistics.

[4]

Ziqiang Cao, Furu Wei, Li Dong, Sujian Li, and Ming Zhou. 2015. Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2153--2159.

Digital Library

[5]

Ziqiang Cao, Furu Wei, Sujian Li, Wenjie Li, Ming Zhou, and Houfeng Wang. 2015. Learning Summary Prior Representation for Extractive Summarization. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics ACL (2), pp. 829--833. Association for Computational Linguistics.

[6]

Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335--336. ACM.

Digital Library

[7]

Harold P. Edmundson. 1969. New Methods in Automatic Extracting. Journal of the Association for Computing Machinery, 16(2), pp. 264--285 (1969).

Digital Library

[8]

Rafael Munoz Elena Lloret, Oscar Ferrandez and Manuel Palomar. 2008. A Text Summarization Approach under the Influence of Textual Entailment. In NLPCS, pp.22--31.

[9]

Gunes Erkan and Dragomir R. Radev. 2004. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, pp. 457--479 (2004).

[10]

Rafael Ferreira, Luciano de Souza Cabral, Rafael Dueire Lins, Gabriel Pereira e Silva, Fred Freitas, George D.C. Cavalcanti, Rinaldo Lima, Steven J. Simske, and Luciano Favaro. 2013. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications, 40(2013), pp. 5755--5764. Elsevier (2013).

[11]

Daniel Gillick, Beno Favre, Dilek Hakkani-Tur, Bernd Bohnet, Yang Liu, and Shasha Xie. 2009. The ICSI/UTD Summarization System at TAC 2009. In Proceedings of Text Analysis Conference (TAC), NIST.

[12]

Stefan HenÃ§, Margot Mieskes, and Iryna Gurevych. 2017. A Reinforcement Learning Approach for Adaptive Single- and Multi-Document Summarization. In Proceedings of the Int. Conference of the German Society for Computational Linguistics and Language Technology (GSCL), pp. 3--12.

[13]

Kai Hong and Ani Nenkova. 2014. Improving the Estimation of Word Importance for News Multi-Document Summarization. In The 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. 712--721, Association for Computational Linguistics.

[14]

Julian Kupiec, Jan O. Pedersen, and Francine Chen. 1995. A Trainable Document Summarizer. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68--73. ACM.

Digital Library

[15]

Gyoung Ho Lee and Kong Joo Lee. 2017. Automatic Text Summarization Using Reinforcement Learning with Embedding Features. In Proceedings of the The 8th International Joint Conference on Natural Language Processing (IJCNLP), pp. 193--197.

[16]

Ju-Hong Lee, Sun Park, Chan-Min Ahn, and Daeho Kim. 2009. Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management 45(1), pp. 20--34 (2009).

Digital Library

[17]

Chen Li, Yang Liu, and Lin Zhao. 2015. Using External Resources and Joint Learning for Bigram Weighting in ILP-Based Multi-Document Summarization. In HLT-NAACL: 778--787.

[18]

Chen Li, Zhongyu Wei, Yang Liu, Yang Jin, and Fei Huang. 2016. Using Relevant Public Posts to Enhance News Article Summarization. In COLING, pp. 557--566.

[19]

Sujian Li, You Ouyang, Wei Wang, and Bin Sun. 2007. Multi-document summarization using support vector regression. In Proceedings of Document Understanding Conference (DUC).

[20]

Chin-Yew Lin and Eduard H. Hovy. 2003. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 71--78. Association for Computational Linguistics.

Digital Library

[21]

Hui Lin and Jeff A. Bilmes. 2011. A class of submodular functions for document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 510--520. Association for Computational Linguistics.

Digital Library

[22]

Hans P. Luhn. 1958. The Automatic Creation of Literature Abstracts. IBM Journal of Research Development, 2(2), pp. 159--165 (1958).

Digital Library

[23]

Ryan McDonald. 2007. A study of global inference algorithms in multi-document summarization. In Proceedings of the European Conference on Information Retrieval (ECIR), pp. 557--564.

Digital Library

[24]

Rada Mihalcea and Paul Tarau. 2004. TextRank: Bringing order into texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 404--411. Association for Computational Linguistics.

[25]

Ramesh Nallapati, Feifei Zhai, and Bowen Zhou. 2017. SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 3075--3081.

Digital Library

[26]

Ani Nenkova. 2005. Automatic text summarization of newswire: lessons learned from the document understanding conference. In AAAI, vol. 5, pp. 1436--1441.

Digital Library

[27]

Minh-Tien Nguyen, Tran Viet Cuong, Nguyen Xuan Hoai, and Minh-Le Nguyen. 2017. Utilizing User Posts to Enrich Web Document Summarization with Matrix Co-factorization. In Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT), pp. 70--77. ACM.

Digital Library

[28]

Minh-Tien Nguyen, Duc-Vu Tran, and Minh-Le Nguyen. 2018. Exploiting User Posts for Web Document Summarization. Transactions on Knowledge Discovery from Data (TKDD), 12(4), 49. ACM (2018).

Digital Library

[29]

Miles Osborne. 2002. Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization-Volume 4, pp. 1--8. Association for Computational Linguistics.

Digital Library

[30]

Ankur P. Parikh, Oscar Tackstrom, Dipanjan Das, and Jakob Uszkoreit. 2016. A Decomposable Attention Model for Natural Language Inference. In arXiv preprint arXiv:1606.01933.

[31]

Sun Park, Ju-Hong Lee, Chan-Min Ahn, Jun Sik Hong, and Seok-Ju Chun. 2006. Query Based Summarization Using Non-negative Matrix Factorization. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 84--89. Springer Berlin/Heidelberg.

Digital Library

[32]

Ramakanth Pasunuru and Mohit Bansal. 2018. Multi-Reward Reinforced Summarization with Saliency and Entailment. In arXiv preprint arXiv:1804.06451.

[33]

Pengjie Ren, Zhumin Chen, Zhaochun Ren, Furu Wei, Jun Ma, and Maarten de Rijke. 2017. Leveraging Contextual Sentence Relations for Extractive Summarization Using a Neural A ention Model. In Proceedings of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 95--104. ACM.

Digital Library

[34]

Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, and Zheng Chen. 2007. Document Summarization Using Conditional Random Fields. In Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI), vol. 7, pp. 2862--2867.

Digital Library

[35]

Doina Tatar, Emma Tamaianu-Morita, Andreea Mihis, and Dana Lupsa. 2008. Summarization by logic segmentation and text entailment. In Advances in Natural Language Processing and Applications (CICLING), pp. 15--26.

[36]

Dingding Wang, Tao Li, Shenghuo Zhu, and Chris Ding. 2008. Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307--314. ACM.

Digital Library

[37]

W.M. Wang, Z. Li, J.W. Wang, and Z.H. Zheng. 2017. How far we can go with extractive text summarization? Heuristic methods to obtain near upper bounds. Expert Systems with Applications, 90(2017), pp. 439--463. Elsevier (2017).

Digital Library

[38]

Kristian Woodsend and Mirella Lapata. 2010. Automatic Generation of Story Highlights. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 565--574. Association for Computational Linguistics.

Digital Library

[39]

Kristian Woodsend and Mirella Lapata. 2012. Multiple Aspect Summarization Using Integer Linear Programming. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 233--243. Association for Computational Linguistics.

Digital Library

Cited By

Zhu XJiang MZhang XNie LDing Z(2024)MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization SystemsProceedings of the ACM on Software Engineering10.1145/36608201:FSE(2561-2583)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660820

Index Terms

An Entailment-based Scoring Method for Content Selection in Document Summarization
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Summarization

Recommendations

Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique
Highlights
- Statistical feature based extractive approach for multi-document summarization.
Abstract
Nowadays abundant amount of information is available on Internet which makes it difficult for the users to locate desired information. Automatic methods are needed to efficiently sieve and scavenge useful information from the Internet. ...
Utilizing User Posts to Enrich Web Document Summarization with Matrix Co-factorization
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology

In the context of social media, users tend to post relevant information corresponding to an event mentioned in a Web document. This paper presents a model to capture the nature of the relationships between sentences and user posts such as relevant ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

SoICT '18: Proceedings of the 9th International Symposium on Information and Communication Technology

December 2018

496 pages

ISBN:9781450365390

DOI:10.1145/3287921

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SOICT: School of Information and Communication Technology - HUST
NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 December 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SoICT 2018

SoICT 2018: The Ninth International Symposium on Information and Communication Technology

December 6 - 7, 2018

Danang City, Viet Nam

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
86
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhu XJiang MZhang XNie LDing Z(2024)MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization SystemsProceedings of the ACM on Software Engineering10.1145/36608201:FSE(2561-2583)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660820

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents