More Web Proxy on the site http://driver.im/

Article

Free access

Evaluation challenges in large-scale document summarization

Authors:

Dragomir R. Radev,

Horacio Saggion,

Elliott DrabekAuthors Info & Claims

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Pages 375 - 382

https://doi.org/10.3115/1075096.1075144

Published: 07 July 2003 Publication History

Abstract

We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary retrievals using 20 queries. We present both qualitative and quantitative results showing the strengths and draw-backs of all evaluation methods and how they rank the different summarizers.

References

[1]

Ron Brandow, Karl Mitze, and Lisa F. Rau. 1995. Automatic Condensation of Electronic Publications by Sentence Selection. Information Processing and Management, 31(5):675--685.

Digital Library

[2]

Jean Carletta. 1996. Assessing Agreement on Classification Tasks: The Kappa Statistic. CL, 22(2):249--254.

Digital Library

[3]

Maxime Crochemore and Wojciech Rytter. 1994. Text Algorithms. Oxford University Press.

Digital Library

[4]

William A. Gale and Kenneth W. Church. 1993. A program for aligning sentences in bilingual corpora. Computational Linguistics, 19(1):75--102.

Digital Library

[5]

Donna Harman and Daniel Marcu, editors. 2001. Proceedings of the 1st Document Understanding Conference. New Orleans, LA, September.

[6]

Eduard Hovy and Chin Yew Lin. 1999. Automated Text Summarization in SUMMARIST. In Inderjeet Mani and Mark T. Maybury, editors, Advances in Automatic Text Summarization, pages 81--94. The MIT Press.

[7]

Klaus Krippendorff. 1980. Content Analysis: An Introduction to its Methodology. Sage Publications, Beverly Hills, CA.

[8]

Inderjeet Mani and Eric Bloedorn. 2000. Summarizing Similarities and Differences Among Related Documents. Information Retrieval, 1 (1).

Digital Library

[9]

Inderjeet Mani, Thérèse Firmin, David House, Gary Klein, Beth Sundheim, and Lynette Hirschman. 2001. The TIPSTER SUMMAC Text Summarization Evaluation. In Natural Language Engineering.

Digital Library

[10]

Dragomir R. Radev, Hongyan Jing, and Malgorzata Budzikowska. 2000. Centroid-Based Summarization of Multiple Documents: Sentence Extraction, Utility-Based Evaluation, and User Studies. In Proceedings of the Workshop on Automatic Summarization at the 6th Applied Natural Language Processing Conference and the 1st Conference of the North American Chapter of the Association for Computational Linguistics, Seattle, WA, April.

Digital Library

[11]

Horacio Saggion. 2000. Génération automatique de résumés par analyse sélective. Ph.D. thesis, Département d'informatique et de recherche opérationnelle. Faculté des arts et des sciences. Université de Montréal, August.

[12]

Gerard Salton. 1988. Automatic Text Processing. Addison-Wesley Publishing Company.

Digital Library

[13]

Sidney Siegel and N. John Jr. Castellan. 1988. Non-parametric Statistics for the Behavioral Sciences. McGraw-Hill, Berkeley, CA, 2nd edition.

[14]

Karen Sparck-Jones and Tetsuya Sakai. 2001. Generic Summaries for Indexing in IR. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 190--198, New Orleans, LA, September.

Digital Library

[15]

Simone Teufel and Marc Moens. 1997. Sentence Extraction as a Classification Task. In Proceedings of the Workshop on Intelligent Scalable Text Summarization at the 35th Meeting of the Association for Computational Linguistics, and the 8th Conference of the European Chapter of the Assocation for Computational Linguistics, Madrid, Spain.

[16]

Anastasios Tombros, Mark Sanderson, and Phil Gray. 1998. Advantages of Query Biased Summaries in Information Retrieval. In Eduard Hovy and Dragomir R. Radev, editors, Proceedings of the AAAI Symposium on Intelligent Text Summarization, pages 34--43, Stanford, California, USA, March 23--25,. The AAAI Press.

Cited By

Yang CXu BThung FShi YZhang TYang ZZhou XShi JHe JHan DLo D(2022)Answer Summarization for Technical Queries: Benchmark and New ApproachProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3560421(1-13)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3560421
Lins ROliveira HCabral LBatista JTenorio BFerreira RLima Rde França Pereira e Silva GSimske SBorghoff USchimmler S(2019)The CNN-CorpusProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345388(1-10)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1145/3342558.3345388
Verberne Svan den Bosch AWubben SKrahmer ENordlie RPharo NFreund LLarsen BRussel D(2017)Automatic Summarization of Domain-specific Forum ThreadsProceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval10.1145/3020165.3022127(253-256)Online publication date: 7-Mar-2017
https://dl.acm.org/doi/10.1145/3020165.3022127
Show More Cited By

Recommendations

Latent Dirichlet learning for document summarization
ICASSP '09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet ...
Latent dirichlet allocation based multi-document summarization
AND '08: Proceedings of the second workshop on Analytics for noisy unstructured text data

Extraction based Multi-Document Summarization Algorithms consist of choosing sentences from the documents using some weighting mechanism and combining them into a summary. In this article we use Latent Dirichlet Allocation to capture the events being ...
Towards coherent single-document summarization: an integer linear programming-based approach
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Automatic Text Summarization (ATS) is a viable option to reduce the content of textual documents, e.g., as a possible preprocessing step in many text mining applications. Single-document extractive summarizers have been developed based on different ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

July 2003

571 pages

Program Chairs:
Erhard W. Hinrichs,
Dan Roth

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 2003

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
618
Total Downloads

Downloads (Last 12 months)58
Downloads (Last 6 weeks)15

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang CXu BThung FShi YZhang TYang ZZhou XShi JHe JHan DLo D(2022)Answer Summarization for Technical Queries: Benchmark and New ApproachProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3560421(1-13)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3560421
Lins ROliveira HCabral LBatista JTenorio BFerreira RLima Rde França Pereira e Silva GSimske SBorghoff USchimmler S(2019)The CNN-CorpusProceedings of the ACM Symposium on Document Engineering 201910.1145/3342558.3345388(1-10)Online publication date: 23-Sep-2019
https://dl.acm.org/doi/10.1145/3342558.3345388
Verberne Svan den Bosch AWubben SKrahmer ENordlie RPharo NFreund LLarsen BRussel D(2017)Automatic Summarization of Domain-specific Forum ThreadsProceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval10.1145/3020165.3022127(253-256)Online publication date: 7-Mar-2017
https://dl.acm.org/doi/10.1145/3020165.3022127
Menéndez HPlaza LCamacho DCamacho DAkerkar RRodriguez Moreno M(2013)A genetic graph-based clustering approach to biomedical summarizationProceedings of the 3rd International Conference on Web Intelligence, Mining and Semantics10.1145/2479787.2479807(1-8)Online publication date: 12-Jun-2013
https://dl.acm.org/doi/10.1145/2479787.2479807
Sadh ASahu ASrivastava DSanyal RSanyal S(2012)Extraction of relevant figures and tables for multi-document summarizationProceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II10.1007/978-3-642-28601-8_34(402-413)Online publication date: 11-Mar-2012
https://dl.acm.org/doi/10.1007/978-3-642-28601-8_34
Blake C(2011)Text miningAnnual Review of Information Science and Technology10.5555/2766865.276687545:1(121-155)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.5555/2766865.2766875
Anagnostopoulos ABroder AGabrilovich EJosifovski VRiedel L(2011)Web Page Summarization for Just-in-Time Contextual AdvertisingACM Transactions on Intelligent Systems and Technology10.1145/2036264.20362783:1(1-32)Online publication date: 1-Oct-2011
https://dl.acm.org/doi/10.1145/2036264.2036278
Jiang HXu SLau FPu PPazzani MAndré ERiecken D(2011)Capturing user reading behaviors for personalized document summarizationProceedings of the 16th international conference on Intelligent user interfaces10.1145/1943403.1943464(355-358)Online publication date: 13-Feb-2011
https://dl.acm.org/doi/10.1145/1943403.1943464
Zhang HFiszman MShin DMiller CRosemblat GRindflesch T(2011)Degree centrality for semantic abstraction summarization of therapeutic studiesJournal of Biomedical Informatics10.1016/j.jbi.2011.05.00144:5(830-838)Online publication date: 1-Oct-2011
https://dl.acm.org/doi/10.1016/j.jbi.2011.05.001
Saggion HTorres-Moreno JCunha ISanJuan EJoshi AHuang CJurafsky D(2010)Multilingual summarization evaluation without human modelsProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944688(1059-1067)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1944566.1944688
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents