[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Automatic word sense discrimination

Published: 01 March 1998 Publication History

Abstract

This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training instances or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words.

References

[1]
Berry, Michael W. 1992. Large-scale sparse singular value computations. The International Journal of Supercomputer Applications, 6(1):13--49.]]
[2]
Brown, Peter F., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1991. Word-sense disambiguation using statistical methods. In Proceedings of the 29th Annual Meeting, pages 264--270, Berkeley CA. Association for Computational Linguistics.]]
[3]
Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479.]]
[4]
Bruce, Rebecca and Jaynce Wiebe. 1994. Word-sense disambiguation using decomposable models. In Proceedings of the 32nd Annual Meeting, pages 139--145, Las Cruces, NM. Association for Computational Linguistics.]]
[5]
Burgess, Curt and Kevin Lund. 1997. Modelling parsing constraints with high-dimensional context space. Language and Cognitive Processes, 12. To appear.]]
[6]
Church, Kenneth W. and William A. Gale. 1991. Concordances for parallel text. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research, pages 40--62, Oxford, England.]]
[7]
Church, Kenneth and William Gale. 1995. Poisson mixtures. Journal of Natural Language Engineering, 1(2):163--190.]]
[8]
Cottrell, Garrison W. 1989. A Connectionist Approach to Word Sense Disambiguation. Pitman, London.]]
[9]
Cutting, Douglas R., David R. Karger, and Jan O. Pedersen. 1993. Constant interaction-time scatter/gather browsing of very large document collections. In Proceedings of SIGIR'93, Pittsburgh, PA.]]
[10]
Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. 1991. An object-oriented architecture for text retrieval. In Proceedings of RIAO'91, pages 285--298, Barcelona, Spain.]]
[11]
Cutting, Douglas R., Jan O. Pedersen, David Karger, and John W. Tukey. 1992. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318--329, Copenhagen, Denmark.]]
[12]
Dagan, Ido, Alon Itai, and Ulrike Schwall. 1991. Two languages are more informative than one. In Proceedings of the 29th Annual Meeting, pages 130--137, Berkeley, CA. Association for Computational Linguistics.]]
[13]
Dagan, Ido, Shaul Marcus, and Shaul Markovitch. 1993. Contextual word similarity and estimation from sparse data. In Proceedings of the 31st Annual Meeting, pages 164--171, Columbus, OH. Association for Computational Linguistics.]]
[14]
Dagan, Ido, Fernando Pereira, and Lillian Lee. 1994. Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting, pages 272--278, Las Cruces, NM. Association for Computational Linguistics.]]
[15]
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391--407.]]
[16]
Dempster, A. P., N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1--38.]]
[17]
Duda, Richard O. and Peter E. Hart. 1973. Pattern Classification and Scene Analysis. John Wiley & Sons, New York.]]
[18]
Finch, Steven Paul. 1993. Finding Structure in Language. Ph.D. thesis, University of Edinburgh.]]
[19]
Gale, William A., Kenneth W. Church, and David Yarowsky. 1992. Work on statistical methods for word sense disambiguation. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 54--60, AAAI Press, Menlo Park, CA.]]
[20]
Gallant, Stephen I. 1991. A practical approach for representing context and for performing word sense disambiguation using neural networks. Neural Computation, 3(3):293--309.]]
[21]
Ghahramani, Zoubin. 1994. Solving inverse problems using an EM approach to density estimation. In Michael C. Mozer, Paul Smolensky, David S. Touretzky, and Andreas S. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Erlbaum Associates, Hillsdale, NJ.]]
[22]
Golub, Gene H. and Charles F. van Loan. 1989. Matrix Computations. The Johns Hopkins University Press, Baltimore and London.]]
[23]
Grefenstette, Gregory. 1992. Use of syntactic context to produce term association lists for text retrieval. In Proceedings of SIGIR '92, pages 89--97.]]
[24]
Grefenstette, Gregory. 1994a. Corpus-derived first, second and third-order word affinities. In Proceedings of the Sixth Euralex International Congress, Amsterdam.]]
[25]
Grefenstette, Gregory. 1994b. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston.]]
[26]
Grefenstette, Gregory. 1996. Evaluation techniques for automatic semantic extraction: Comparing syntactic and window-based approaches. In Branimir Boguraev and James Pustejovsky, editors, Corpus Processing for Lexical Acquisition. MIT Press, Cambridge, MA.]]
[27]
Guthrie, Joe A., Louise Guthrie, Yorick Wilks, and Homa Aidinejad. 1991. Subject-dependent co-occurrence and word sense disambiguation. In Proceedings of the 29th Annual Meeting, pages 146--152, Berkeley, CA. Association for Computational Linguistics.]]
[28]
Harman, D. K., editor. 1993. The First Text REtrieval Conference (TREC-1). U.S. Department of Commerce, Washington, DC. NIST Special Publication 500--207.]]
[29]
Hearst, Marti A. 1991. Noun homograph disambiguation using local context in large text corpora. In Proceedings of the Seventh Annual Conference of the UW Centre for the New OED and Text Research: Using Corpora, pages 1--22, Oxford.]]
[30]
Hearst, Marti and Christian Plaunt. 1993. Subtopic structuring for full-length document access. In Proceedings of SIGIR '93, pages 59--68.]]
[31]
Hirst, Graeme. 1987. Semantic Interpretation and the Resolution of Ambiguity. Cambridge University Press, Cambridge.]]
[32]
Jain, Anil K. and Richard C. Dubes. 1988. Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs, NJ.]]
[33]
Karov, Yael and Shimon Edelman. 1996. Learning similarity-based word sense disambiguation from sparse data. In Proceedings of the Fourth Workshop on Very Large Corpora.]]
[34]
Kelly, Edward and Phillip Stone. 1975. Computer Recognition of English Word Senses. North-Holland, Amsterdam.]]
[35]
Kilgarriff, Adam. 1993. Dictionary word sense distinctions: An enquiry into their nature. Computers and the Humanities, 26:365--387.]]
[36]
Krovetz, Robert. 1997. Homonymy and polysemy in information retrieval. In Proceedings of the 35th Annual Meeting and EACL 8, pages 72--79, Morgan Kaufmann, San Francisco, CA. Association for Computational Linguistics.]]
[37]
Krovetz, Robert and W. Bruce Croft. 1989. Word sense disambiguation using machine-readable dictionaries. In Proceedings of SIGIR '89, pages 127--136, Cambridge, MA.]]
[38]
Krovetz, Robert and W. Bruce Croft. 1992. Lexical ambiguity and information retrieval. ACM Transactions on Information Systems, 10(2):115--141.]]
[39]
Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Towards building contextual representations of word senses using statistical models. In Branimir Boguraev and James Pustejovsky, editors, Acquisition of Lexical Knowledge From Text: Workshop Proceedings, pages 10--21, Ohio.]]
[40]
Leacock, Claudia, Geoffrey Towell, and Ellen Voorhees. 1993. Corpus-based statistical sense resolution. In Proceedings of the ARPA Workshop on Human Language Technology, Morgan Kaufman, San Mateo, CA.]]
[41]
Lesk, M. E. 1969. Word-word association in document retrieval systems. American Documentation, 20(1):27--38.]]
[42]
Lesk, Michael. 1986. Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, pages 24--26, New York. Association for Computing Machinery.]]
[43]
Miller, George A. and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28.]]
[44]
Niwa, Yoshiki and Yoshihiko Nitta. 1994. Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of COLING94, pages 304--309.]]
[45]
Ott, Lyman. 1992. An Introduction to Statistical Methods and Data Analysis. Wadsworth, Belmont, CA.]]
[46]
Pedersen, Ted and Rebecca Bruce. 1997. Distinguishing word senses in untagged text. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 197--207, Providence, RI.]]
[47]
Pereira, Fernando, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st Annual Meeting, pages 183--190, Columbus, OH. Association for Computational Linguistics.]]
[48]
Qiu, Yonggang and H.P. Frei. 1993. Concept based query expansion. In Proceedings of SIGIR '93, pages 160--169.]]
[49]
Ruge, Gerda. 1992. Experiments on linguistically-based term associations. Information Processing & Management, 28(3):317--332.]]
[50]
Salton, Gerard. 1971. Experiments in automatic thesaurus construction for information retrieval. In Proceedings IFIP Congress, pages 43--49.]]
[51]
Salton, Gerard and Chris Buckley. 1990. Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4):288--297.]]
[52]
Salton, Gerard and Michael J. McGill. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York.]]
[53]
Sanderson, Mark. 1994. Word sense disambiguation and information retrieval. In Proceedings of SIGIR '94, pages 142--151.]]
[54]
Schütze, Hinrich. 1992a. Context space. In Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale, editors, Working Notes of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language, pages 113--120, AAAI Press, Menlo Park, CA.]]
[55]
Schütze, Hinrich. 1992b. Dimensions of meaning. In Proceedings of Supercomputing '92, pages 787--796, Minneapolis, MN.]]
[56]
Schütze, Hinrich. 1997. Ambiguity Resolution in Language Learning. CSLI Publications, Stanford, CA.]]
[57]
Schütze, Hinrich and Jan O. Pedersen. 1995. Information retrieal based on word senses. In Proceedings for the Fourth Annual Symposium on Document Analysis and Information Retrieval, pages 161--175, Las Vegas, NV.]]
[58]
Schütze, Hinrich and Jan O. Pedersen. 1997. A cooccurrence-based thesaurus and two applications to information retrieval. Information Processing & Management, 33(3):307--318.]]
[59]
Sparck-Jones, Karen. 1986. Synonymy and Semantic Classification. Edinburgh University Press, Edinburgh. (Publication of Ph.D. thesis, University of Cambridge, 1964.)]]
[60]
Sparck-Jones, Karen. 1991. Notes and references on early classification work. ACM SIGIR Forum, 25(1):10--17.]]
[61]
van Rijsbergen, C. J. 1979. Information Retrieval. Second edition. Butterworths, London.]]
[62]
Voorhees, Ellen M. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of SIGIR '93, pages 171--180.]]
[63]
Walker, Donald E. and Robert A. Amsler. 1986. The use of machine-readable dictionaries in sublanguage analysis. In Ralph Grishman and Richard Kittredge, editors, Analyzing Language in Restricted Domains: Sublanguage Description and Processing. L. Erlbaum Associates, Hillsdale, NJ, pages 69--84.]]
[64]
Wilks, Yorick A., Dan C. Fass, Cheng Ming Guo, James E. McDonald, Tony Plate, and Brian M. Slator. 1990. Providing machine tractable dictionary tools. Journal of Computers and Translation, 2.]]
[65]
Willett, Peter. 1988. Recent trends in hierarchic document clustering: A critical review. Information Processing & Management, 24(5):577--597.]]
[66]
Winer, B. J. 1971. Statistical Principles in Experimental Design. Second edition. McGraw-Hill, New York, NY.]]
[67]
Yarowsky, David. 1992. Word-sense disambiguation using ststistical models of Roget's categories trained on large corpora. In Proceedings of Coling-92, pages 454--460, Nantes, France.]]
[68]
Yarowsky, David. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd Annual Meeting, Cambridge, MA. Association for Computational Linguistics.]]

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Linguistics
Computational Linguistics  Volume 24, Issue 1
Special issue on word sense disambiguation
March 1998
179 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 March 1998
Published in COLI Volume 24, Issue 1

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)19
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Query expansion using Haar wavelet transformJournal of Information Science10.1177/0165551522111100550:4(991-1004)Online publication date: 1-Aug-2024
  • (2024)Training and evaluation of vector models for GalicianLanguage Resources and Evaluation10.1007/s10579-024-09740-058:4(1419-1462)Online publication date: 1-Dec-2024
  • (2024)Near-term advances in quantum natural language processingAnnals of Mathematics and Artificial Intelligence10.1007/s10472-024-09940-y92:5(1249-1272)Online publication date: 11-Apr-2024
  • (2023)An Innovative Method for Hindi Word Sense DisambiguationSN Computer Science10.1007/s42979-023-02078-44:6Online publication date: 15-Sep-2023
  • (2022)Quantum Mathematics in Artificial IntelligenceJournal of Artificial Intelligence Research10.1613/jair.1.1270272(1307-1341)Online publication date: 4-Jan-2022
  • (2022)Arabic Word Sense Disambiguation for Information RetrievalACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351045121:4(1-19)Online publication date: 19-Jan-2022
  • (2021)Transitivity of transformation matrices to bridge word vector spaces over 1000 yearsThe Journal of Supercomputing10.1007/s11227-020-03584-577:9(9848-9878)Online publication date: 1-Sep-2021
  • (2021)Incremental Composition in Distributional SemanticsJournal of Logic, Language and Information10.1007/s10849-021-09337-830:2(379-406)Online publication date: 1-Jun-2021
  • (2021)Sense representations for Portuguese: experiments with sense embeddings and deep neural language modelsLanguage Resources and Evaluation10.1007/s10579-020-09525-155:4(901-924)Online publication date: 1-Dec-2021
  • (2021)Interpretable semantic textual similarity of sentences using alignment of chunks with classification and regressionApplied Intelligence10.1007/s10489-020-02144-x51:10(7322-7349)Online publication date: 1-Oct-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media