An overview of textual semantic similarity measures based on web intelligence

Jorge Martinez-Gil¹

599 Accesses
12 Altmetric
1 Mention
Explore all metrics

Abstract

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: International joint conference on artificial intelligence (IJCAI), pp 805–810
Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: Proceedings of WWW, pp 757–766
Budanitsky A, Hirst G (2006) Evaluating WordNet-based measures of lexical semantic relatedness. Comput Linguistics 32(1): 13–47
Article MATH Google Scholar
Cilibrasi R, Vitányi PM (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3): 370–383
Article Google Scholar
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. JASIST 41(6): 391–407
Article Google Scholar
Grubbs F (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1): 1–21
Article Google Scholar
Leacock C, Chodorow M, Miller GA (1998) Using corpus statistics and WordNet relations for sense identification. Comput Linguistics 24(1): 147–165
Google Scholar
Lesk M (1986) Information in data: using the Oxford english dictionary on a computer. SIGIR Forum 20(1–4): 18–21
Article Google Scholar
Li Y, Bandar A, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4): 871–882
Article Google Scholar
Patuwo BE, Hu M (1998) Forecasting with artificial neural networks: the state of the art. Int J Forecast 14(1): 35–62
Article Google Scholar
Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet::Similarity—measuring the relatedness of concepts. In: Proceedings of AAAI, pp 1024–1025
Pirro G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68(11): 1289–1308
Article Google Scholar
Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: International joint conference on artificial intelligence (IJCAI), pp 448–453
Rousseeuw PJ, Leroy AM (2005) Robust regression and outlier detection. Wiley, New York
Google Scholar
Wolfe MB, Goldman SR (2003) Use of latent semantic analysis for predicting psychological phenomena: two issues and proposed solutions. Behav Res Methods 35: 22–31
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Extremadura, 10003, Caceres, Spain
Jorge Martinez-Gil

Authors

Jorge Martinez-Gil
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jorge Martinez-Gil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Martinez-Gil, J. An overview of textual semantic similarity measures based on web intelligence. Artif Intell Rev 42, 935–943 (2014). https://doi.org/10.1007/s10462-012-9349-8

Download citation

Published: 30 June 2012
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10462-012-9349-8

An overview of textual semantic similarity measures based on web intelligence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Analysis of Semantic Similarity Measures for Information Retrieval

AST Method for Scoring String-to-text Similarity

Dimensions of Semantic Similarity

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An overview of textual semantic similarity measures based on web intelligence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Analysis of Semantic Similarity Measures for Information Retrieval

AST Method for Scoring String-to-text Similarity

Dimensions of Semantic Similarity

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation