Abstract
The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Philip Stuart Resnik. Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania, 1993.
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, 1994.
Steven Paul Finch. Finding Structure in Language. PhD thesis, University of Edinburgh, 1993.
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge MA, 1999.
B. Boguraev and J. Pustejovsky, editors. Corpus Processing for Lexical Acquisition. MIT Press, Cambridge MA, 1995.
G. Grefenstette. Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window-Based Approaches, pages 205–216. MIT Press, Cambridge MA, 1996.
F. Smajda. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143–177, 1993.
M. P. Oakes. Statistics for Corpus Linguistics. Edinburgh University Press, 1997.
K.W. Church and W. A. Gale. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40–62, 1991.
W. N. Francis and H. Kučera. Brown Corpus Manual. Brown University, Providence, Rhode Island, revised and amplified edition, 1979.
F. R. Palmer. Selected Papers of J. R. Firth 1952-1959. London: Longman, 1968.
K.W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall, 1988.
A. K. Jain, R. P. W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4–37, 2000.
D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. John Willey and Sons, 1985.
G. A. Miller et al. Five papers on Wordnet. Technical report, 1993.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Smrž, P., Rychlý, P. (2001). Finding Semantically Related Words in Large Corpora. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_14
Download citation
DOI: https://doi.org/10.1007/3-540-44805-5_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive