Automatic text summarization using latent semantic analysis

I. V. Mashechkin¹,
M. I. Petrovskiy¹,
D. S. Popov¹ &
…
D. V. Tsarev¹

565 Accesses
23 Citations
Explore all metrics

Abstract

In the paper, the most state-of-the-art methods of automatic text summarization, which build summaries in the form of generic extracts, are considered. The original text is represented in the form of a numerical matrix. Matrix columns correspond to text sentences, and each sentence is represented in the form of a vector in the term space. Further, latent semantic analysis is applied to the matrix obtained to construct sentences representation in the topic space. The dimensionality of the topic space is much less than the dimensionality of the initial term space. The choice of the most important sentences is carried out on the basis of sentences representation in the topic space. The number of important sentences is defined by the length of the demanded summary. This paper also presents a new generic text summarization method that uses nonnegative matrix factorization to estimate sentence relevance. Proposed sentence relevance estimation is based on normalization of topic space and further weighting of each topic using sentences representation in topic space. The proposed method shows better summarization quality and performance than state-of-the-art methods on the DUC 2001 and DUC 2002 standard data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis

Unsupervised Extractive Text Summarization Using Frequency-Based Sentence Clustering

A New Automatic Multi-document Text Summarization using Topic Modeling

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Mani, I. and Maybury, M.T., Advance in Automatic Text Summarization, Cambridge, Ma: The MIT Press, 1999.
Google Scholar
Ježek, K. and Steinberger, J. Automatic Text Summarization (The State of the Art 2007 and New Challenges), Proc. of Znalosti 2008, Bratislava, 2008, pp. 1–12. http://textmining.zcu.cz/publications/Z08.pdf.
Garcia, E., Information Retrieval Tutorials: Document Indexing Tutorial. http://www.miislita.com/information-retrieval-tutorial/indexing.html.
Garcia, E., Vector Theory and Keyword Weights. http://www.miislita.com/term-vector/term-vector-1.html.
Chisholm, E. and Kolda, T.G., New Term Weighting Formulas for the Vector Space Method in Information Retrieval, Tech. Rep. no. ORNL-TM-13756, Oak Ridge National Laboratory, Oak Ridge, TN, March 1999.
Google Scholar
Landauer, T.K. and Dumais, S.T., A solution to Plato’s Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction and Representation of Knowledge, Psychological Rev., 1997, vol. 104, pp. 211–240.
Article Google Scholar
Ye, Y., Comparing Matrix Methods in Text-based Information Retrieval, Tech. Rep. School of Mathematical Sciences, Peking University, 2000. http://dean.pku.edu.cn/bksky/2000jzlwj/39.pdf.
Gong, Y. and Liu, X., Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis, SIGIR-2001, 2001.
Lee, D.D. and Seung, H.S., Learning the Parts of Objects by Non-negative Matrix Factorization, Nature, 1999, vol. 401, pp. 788–791.
Article Google Scholar
Wei Xu, Xin Liu, and Yihong Gong, Document Clustering Based on Non-negative Matrix Factorization, Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, Toronto, 2003.
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., and Plemmons, R.J., Algorithms and Applications for Approximate Nonnegative Matrix Factorization, Computational Statistics Data Analysis, 2007, vol. 52, no. 1, pp. 155–173.
Article MathSciNet MATH Google Scholar
Rakesh, P., Shivapratap, G., Divya, G., and Soman, K.P., Evaluation of SVD and NMF Methods for Latent Semantic Analysis, Int. J. Recent Trends Engineering, 2009, vol. 1, no. 3.
Berry, M.W., Dumais, S.T., and O’Brien G.W., Using Linear Algebra for Intelligent Information Retrieval, Univ. of Tennessee Knoxville, TN, USA, 1994.
Steinberger, J., Text Summarization within the LSA Framework, PhD Dissertation, Univ. of West Bohemia in Pilsen, Czech Republic, 2007.
Google Scholar
Ju-Hong Lee, Sun Park, Chan-Min Ahn, and Daeho Kim, Automatic Generic Document Summarization Based on Non-negative Matrix Factorization, Information Processing Management: Int. J., 2009, pp. 20–34.
Sun Park, Personalized Summarization Agent Using Non-negative Matrix Factorization, PRICAI 2008: Trends in Artificial Intelligence, 2008.
Sun Park, Ju-Hong Lee, Deok-Hwan Kim, and Chan-Min Ahn, Multi-document Summarization Using Weighted Similarity between Topic and Clustering-based Non-negative Semantic Feature, in Advances in Data and Web Management, 2007
Lin, C.-Y., Looking for a Few Good Metrics: Automatic Summarization Evaluation — How many samples are enough?, Proc. of NTCIR 2004, Tokyo, 2004, pp. 1765–1776.
Document Understanding Conferences. http://duc.nist.gov.
DTU Toolbox. http://isp.imm.dtu.dk/toolbox/menu.html.

Download references

Author information

Authors and Affiliations

Department of Computational Mathematics and Cybernetics, Moscow State University, Moscow, 119991, Russia
I. V. Mashechkin, M. I. Petrovskiy, D. S. Popov & D. V. Tsarev

Authors

I. V. Mashechkin
View author publications
You can also search for this author in PubMed Google Scholar
M. I. Petrovskiy
View author publications
You can also search for this author in PubMed Google Scholar
D. S. Popov
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Tsarev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. V. Mashechkin.

Additional information

Original Russian Text © I.V. Mashechkin, M.I. Petrovskiy, D.S. Popov, D.V. Tsarev, 2011, published in Programmirovanie, 2011, Vol. 37, No. 6.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mashechkin, I.V., Petrovskiy, M.I., Popov, D.S. et al. Automatic text summarization using latent semantic analysis. Program Comput Soft 37, 299–305 (2011). https://doi.org/10.1134/S0361768811060041

Download citation

Received: 24 May 2011
Published: 19 November 2011
Issue Date: November 2011
DOI: https://doi.org/10.1134/S0361768811060041

Automatic text summarization using latent semantic analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis

Unsupervised Extractive Text Summarization Using Frequency-Based Sentence Clustering

A New Automatic Multi-document Text Summarization using Topic Modeling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic text summarization using latent semantic analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis

Unsupervised Extractive Text Summarization Using Frequency-Based Sentence Clustering

A New Automatic Multi-document Text Summarization using Topic Modeling

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation