[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/2140490.2140492dlproceedingsArticle/Chapter ViewAbstractPublication PagesgemsConference Proceedingsconference-collections
research-article
Free access

Comparison of the baseline knowledge-, corpus-, and web-based similarity measures for semantic relations extraction

Published: 31 July 2011 Publication History

Abstract

Unsupervised methods of semantic relations extraction rely on a similarity measure between lexical units. Similarity measures differ both in kinds of information they use and in the ways how this information is transformed into a similarity score. This paper is making a step further in the evaluation of the available similarity measures within the context of semantic relation extraction. We compare 21 baseline measures -- 8 knowledge-based, 4 corpus-based, and 9 web-based metrics with the BLESS dataset. Our results show that existing similarity measures provide significantly different results, both in general performances and in relation distributions. We conclude that the results suggest developing a combined similarity measure.

References

[1]
Alan Agresti. Categorical Data Analysis (Wiley Series in Probability and Statistics). Wiley series in probability and statistics. Wiley Interscience, Hoboken, NJ, 2 edition, 2002.
[2]
Satanjeev Banerjee and Ted Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In International Joint Conference on Artificial Intelligence, volume 18, pages 805--810, 2003.
[3]
Marco Baroni, Silvia Bernardini, Adriano Ferraresi, and Eros Zanchetta. The wacky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3):209--226, 2009.
[4]
Alexander Budanitsky and Graeme Hirst. Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Workshop on WordNet and Other Lexical Resources, volume 2, 2001.
[5]
John A. Bullinaria and Joseph P. Levy. Extracting semantic representations from word cooccurrence statistics: A computational study. Behavior Research Methods, 39(3):510, 2007.
[6]
Rudi L. Cilibrasi and Paul M. B. Vitanyi. The Google Similarity Distance. IEEE Trans. on Knowl. and Data Eng., 19(3):370--383, 2007.
[7]
James R. Curran. From distributional to semantic similarity. PhD thesis, University of Edinburgh, 2003.
[8]
Thomas M. J. Fruchterman and Edward M. Reingold. Graph drawing by force-directed placement. Software: Practice and Experience, 21(11):1129--1164, 1991.
[9]
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery (The Springer International Series in Engineering and Computer Science). Springer, 1 edition, 1994. ISBN 0792394682.
[10]
Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics, pages 539--545, Morristown, NJ, USA, 1992. Association for Computational Linguistics.
[11]
Jay J. Jiang and David W. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In International Conference Research on Computational Linguistics (ROCLING X), pages 19--33, 1997.
[12]
Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall, 2009.
[13]
Claudia Leacock and Martin Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification. An Electronic Lexical Database, pages 265--283, 1998.
[14]
Dekang Lin. Automatic retrieval and clustering of similar words. In Proceedings of the 17th international conference on Computational linguistics-Volume 2, pages 768--774. Association for Computational Linguistics, 1998a.
[15]
Dekang Lin. An Information-Theoretic Definition of Similarity. In In Proceedings of the 15th International Conference on Machine Learning, pages 296--304, 1998b.
[16]
Robert Lindsey, Vladislav D. Veksler, Alex Grintsvayg, and Wayne D. Gray. Be wary of what your computer reads: the effects of corpus selection on measuring semantic relatedness. In 8th International Conference of Cognitive Modeling, ICCM, 2007.
[17]
Rado Mihalcea, Corley Corley, and Carlo Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the National Conference on Artificial Intelligence, volume 21, page 775. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press, 2006.
[18]
George A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39--41, 1995.
[19]
George A. Miller and Walter G. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1):1--28, 1991.
[20]
George A. Miller, Claudia Leacock, Randee Tengi, and Ross T. Bunker. A semantic concordance. In Proceedings of the workshop on Human Language Technology, pages 303--308. Association for Computational Linguistics, 1993.
[21]
Alexander Panchenko. Can we automatically reproduce semantic relations of an information retrieval thesaurus? In 4th Russian Summer School in Information Retrieval, pages 13--18. Voronezh State University, 2010.
[22]
Siddharth Patwardhan and Ted Pedersen. Using WordNet-based context vectors to estimate the semantic relatedness of concepts. Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, page 1, 2006.
[23]
Ted Pedersen, Siddharth Patwardhan, and Jason Michelizzi. WordNet:: Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004 on XX, pages 38--41. Association for Computational Linguistics, 2004.
[24]
Yves Peirsman, Kris Heylen, and Dirk Speelman. Putting things in order. First and second order context models for the calculation of semantic similarity. Proceedings of the 9th Journées internationales d'Analyse statistique des Données Textuelles (JADT 2008), pages 907--916, 2008.
[25]
Philip Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence., volume 1, pages 448--453, 1995.
[26]
H. Rubenstein and J.B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633, 1965.
[27]
Magnus Sahlgren. The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Stockholm University, 2006.
[28]
Helmut Schmid. Probabilistic Part-of-Speech Tagging Using Decision Trees. pages 44--49, 1994.
[29]
Rion Snow, Daniel Jurafsky, and Andrew Y. Ng. Learning syntactic patterns for automatic hypernym discovery. Advances in Neural Information Processing Systems (NIPS), 17:1297--1304, 2004.
[30]
Peter Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the twelfth european conference on machine learning (ecml-2001), 2001.
[31]
Tim Van de Cruys. Mining for Meaning: The Extraction of Lexicosemantic Knowledge from Text. PhD thesis, University of Groningen, 2010.
[32]
Tonio Wandmacher. How semantic is Latent Semantic Analysis? Proceedings of TALN/RECITAL, 2005.
[33]
Zhibiao Wu and Martha Palmer. Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pages 133--138. Association for Computational Linguistics, 1994.
  1. Comparison of the baseline knowledge-, corpus-, and web-based similarity measures for semantic relations extraction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      GEMS '11: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics
      July 2011
      81 pages
      ISBN:9781937284169

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 31 July 2011

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 181
        Total Downloads
      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media