Finding Semantically Related Words in Large Corpora

Pavel Smrž² &
Pavel Rychlý²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2166))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

446 Accesses
5 Citations

Abstract

The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and describe the treatment of collocations. Next we present the procedure of hierarchical clustering of a very large number of semantically related words and give examples of the resulting partitioning of data in the form of dendrogram. Finally we show a form of the output presentation that facilitates the inspection of the resulting word clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Domain-agnostic discovery of similarities and concepts at scale

Article 30 August 2016

Sequentially Grouping Items into Clusters of Unspecified Number

Cluster Analysis

References

Philip Stuart Resnik. Selection and Information: A Class-Based Approach to Lexical Relationships. PhD thesis, University of Pennsylvania, 1993.
Google Scholar
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, 1994.
Google Scholar
Steven Paul Finch. Finding Structure in Language. PhD thesis, University of Edinburgh, 1993.
Google Scholar
C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge MA, 1999.
MATH Google Scholar
B. Boguraev and J. Pustejovsky, editors. Corpus Processing for Lexical Acquisition. MIT Press, Cambridge MA, 1995.
Google Scholar
G. Grefenstette. Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window-Based Approaches, pages 205–216. MIT Press, Cambridge MA, 1996.
Google Scholar
F. Smajda. Retrieving collocations from text: Xtract. Computational Linguistics, 19:143–177, 1993.
Google Scholar
M. P. Oakes. Statistics for Corpus Linguistics. Edinburgh University Press, 1997.
Google Scholar
K.W. Church and W. A. Gale. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40–62, 1991.
Google Scholar
W. N. Francis and H. Kučera. Brown Corpus Manual. Brown University, Providence, Rhode Island, revised and amplified edition, 1979.
Google Scholar
F. R. Palmer. Selected Papers of J. R. Firth 1952-1959. London: Longman, 1968.
Google Scholar
K.W. Church and P. Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.
Google Scholar
A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice Hall, 1988.
MATH Google Scholar
A. K. Jain, R. P. W. Duin, and J. Mao. Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1):4–37, 2000.
Article Google Scholar
D. Titterington, A. Smith, and U. Makov. Statistical Analysis of Finite Mixture Distributions. John Willey and Sons, 1985.
Google Scholar
G. A. Miller et al. Five papers on Wordnet. Technical report, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Masaryk University Brno, Botanická 68a, 602 00, Brno, Czech Republic
Pavel Smrž & Pavel Rychlý

Authors

Pavel Smrž
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Rychlý
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Computer Science and Engineering, University of West Bohemia in Plzeň, Faculty of Applied Sciences, Univerzitní 22, 306-14, Plzeň, Czech Republic
Václav Matoušek , Pavel Mautner , Roman Mouček & Karel Taušer , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smrž, P., Rychlý, P. (2001). Finding Semantically Related Words in Large Corpora. In: Matoušek, V., Mautner, P., Mouček, R., Taušer, K. (eds) Text, Speech and Dialogue. TSD 2001. Lecture Notes in Computer Science(), vol 2166. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44805-5_14

Download citation

DOI: https://doi.org/10.1007/3-540-44805-5_14
Published: 24 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42557-1
Online ISBN: 978-3-540-44805-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Finding Semantically Related Words in Large Corpora

Abstract

Access this chapter

Preview

Similar content being viewed by others

Domain-agnostic discovery of similarities and concepts at scale

Sequentially Grouping Items into Clusters of Unspecified Number

Cluster Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Finding Semantically Related Words in Large Corpora

Abstract

Access this chapter

Preview

Similar content being viewed by others

Domain-agnostic discovery of similarities and concepts at scale

Sequentially Grouping Items into Clusters of Unspecified Number

Cluster Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation