Abstract
Automatic text summarization is a text compression problem with many applications in natural language processing. In this paper we focus the problem of the evaluation of text summarization system. We propose an unsupervised approach based on keywords: it does not require large amount of manual processing and can be implemented as a fully automatic procedure. We also conduct a series of experiments with naïve informants and professional experts. The results of the experiments with informants, experts and automatically extracted keywords confirm that keywords, as one of the types of text compression, can be successfully used for the evaluation of summaries quality. Our data is represented by (but not restricted to) different types of Russian news texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
These terms (precision, etc.), well-known in NLP community, should be interpreted in a different way here: they represent metrics by which experts estimate the quality of summaries, rather than automatically calculated quality measures. For example, in [5] experts assign each summary precision and redundancy values from the rating scale.
- 2.
There initially 25 articles given to each of the informants, ant they were asked to write down the words. However, after we received the first answers, we decided to change the instruction: we asked the informants to underline the words in the text. It helped us to avoid misprints and to maintain information about the positions of the words in the text. We also reduced the number of articles to 12, because the preliminary results showed that the number of errors invariably increased with the growth of the number of articles.
References
Hennig, L., De Luca, E.W., Albayrak, S.: Learning summary content units with topic modeling. In: COLING 2010: Poster Volume, pp. 391–399 (2010)
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 157–165 (1958)
Nenkova, A., Passonneau, R.: Evaluating content selection in summarization: the pyramid method. In: HLT-NAACL 2004: Main Proceedings, pp. 145–152 (2004)
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the Third Text REtrieval Conference (TREC 1994) (1994)
Solov’ev, A.N., Antonova, A.J., Pazel’skaja, A.G.: Using sentiment-analysis for text information extraction. In: Computational Linguistics and Intelligent Technology: According to the Materials of the Annual International Conference “Dialogue” vol. 11, no. 18: В 2т. Т. 1: The Main Program of the Conference, pp. 616–627. Publishing House of the Russian State Humanitarian University (2012)
Yagunova, E.V., Makarova, O.E., Antonov, A.Y., Solovyov, A.N.: Various compression methods in the study of understanding the text of the news. In: Understanding in Communication. Man in the Information Space, vol. 2, pp. 414–421. Publishing House of YAGPU, Yaroslavl – Moscow (2012)
Lenta.ru: Rambler Media Group. http://www.lenta.ru
TextAnalyst. http://www.analyst.ru/index.php?lang=eng&dir=content/products/&id=ta
Acknowledgements
The authors acknowledge Saint-Petersburg State University for the research grant 30.38.305.2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yagunova, E., Makarova, O., Pronoza, E. (2015). Data-Driven Unsupervised Evaluation of Automatic Text Summarization Systems. In: Pichardo Lagunas, O., Herrera Alcántara, O., Arroyo Figueroa, G. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2015. Lecture Notes in Computer Science(), vol 9414. Springer, Cham. https://doi.org/10.1007/978-3-319-27101-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-27101-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27100-2
Online ISBN: 978-3-319-27101-9
eBook Packages: Computer ScienceComputer Science (R0)