Keyword Extraction from a Single Document Using Centrality Measures

Girish Keshav Palshikar¹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 4815))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

2991 Accesses
40 Citations

Abstract

Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this graph are candidates as keywords. We model importance of a word in terms of its centrality in this graph. Using graph-theoretical notions of vertex centrality, we suggest several algorithms to extract keywords from the given document. We demonstrate the effectiveness of the proposed algorithms on real-life documents.

Download to read the full chapter text

Chapter PDF

Keyword Extraction Using Graph Centrality and WordNet

Semantic Measures for Keywords Extraction

Unsupervised Keyword Extraction Using the GoW Model and Centrality Scores

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms 2/e. MIT Press, Cambridge (2001)
Google Scholar
Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry 40, 35–41 (1977)
Article Google Scholar
Kubica, J., Moore, A., Cohn, D., Schneider, J.: Finding underlying structure: A fast graph-based method for link analysis and collaboration queries. In: Proc. 20th Int. Conf. on Machine Learning (ICML 2003) (2003)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. Journal on AI Tools 13(1), 157–169 (2004)
Article Google Scholar
Matsumura, N., Ohsawa, Y., Ishizuka, M.: Pai: Automatic indexing for extracting assorted keywords from a document. In: Proc. AAAI 2002 (2002)
Google Scholar
Ohsawa, Y., Benson, N.E., Yachida, M.: Keygraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proc. Advanced Digital Library Conference (ADL 1998), pp. 12–18 (1998)
Google Scholar
Wasserman, S., Faust, K., Iacobucci, D.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Tata Research Development and Design Centre (TRDDC), 54B, Hadapsar Industrial Estate, Pune 411013, India
Girish Keshav Palshikar

Authors

Girish Keshav Palshikar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ashish Ghosh Rajat K. De Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Palshikar, G.K. (2007). Keyword Extraction from a Single Document Using Centrality Measures. In: Ghosh, A., De, R.K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2007. Lecture Notes in Computer Science, vol 4815. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77046-6_62

Download citation

DOI: https://doi.org/10.1007/978-3-540-77046-6_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77045-9
Online ISBN: 978-3-540-77046-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)