Verifying Text Similarity Measures for Two Layered Retrieval

Andrzej Siemiński⁴

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 80))

755 Accesses
1 Citations

Abstract

The goal of the paper is to assess the usefulness of various text similarity measures for the two layered Internet search. In that approach the first layer is a generic Internet search engine. The second layer enables the user to evaluate, reorganize, filter and personalize the results of first layer search. It is run on a local work station and can fully exploit the so called user dividend. Crucial for that stage is assessing text similarity between text segments. The papers discusses classical, statistic text similarity measures as well semantic, WordNet based semantic measures. The results of an experiment show, that without word disambiguation techniques the semantic approaches can not outperform statistic methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

General Representation Model for Text Similarity

A Correlation-Based Semantic Model for Text Search

AST Method for Scoring String-to-text Similarity

References

Siemiński, A.: The potentials of client oriented prefetching. Intelligent technologies for inconsistent knowledge processing. In: Nguyen, N.T. (ed.) Magill: Advanced Knowledge International, cop., pp. 221–238 (2004)
Google Scholar
Cox, K.: A Unified Approach to Indexing and Retrieval of Information, DOC 94-10/94 Eanff, Albera, pp. 176 -181 (1994)
Google Scholar
Cox, K.: Searching by Browsing. University of Canberra, Australia. PhD Thesis
Google Scholar
Sherman, C.: Humans Do It Better:Inside the Open Directory Project, (July 2000), http://www.infotoday.com/online/OL2000/sherman7.html. (2000)
Manning, C., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Google Scholar
Beall, J.: The Weaknesses of Full-Text Searching. The Journal of Academic Librarianship 34(5), 438–444 (2008)
Article Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C.D., Gross, D., Miller, K.: WordNet: An online lexical database. Int. J. Lexicograph 3(4), 235–244 (1990)
Article Google Scholar
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity. American Association for Artificial Intelligence, 775–780 (2006), www.aaai.org
Siemiński, A.: Using WordNet to measure the similarity of link texts. In: Nguyen, N.T., Kowalczyk, R., Chen, S.-M. (eds.) ICCCI 2009. LNCS, vol. 5796, pp. 720–731. Springer, Heidelberg (2009)
Chapter Google Scholar
http://nlp.stanford.edu/software/tagger.shtml
Seco, N., Veale, T., Hayes, J.: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In: Proceedings of the European Conference of Artificial Intelligence (2004)
Google Scholar
http://www.codeproject.com/KB/string/semanticsimilaritywordnet.aspx
http://www.informatics.indiana.edu/fil/is/JavaCrawlers/
Piasecki, M., Szpakowicz, S., Broda, B.: A WordNet from the ground up. Oficyna Wydawnicza Politechniki Wrocławskiej, Wrocław (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Wybrzeże, Wrocław University of Technology, Wyspiańskiego 27, 50-370, Wroclaw, Poland
Andrzej Siemiński

Authors

Andrzej Siemiński
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Informatics, Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370, Wroclaw, Poland
Ngoc Thanh Nguyen & Aleksander Zgrzywa &
Multimedia Systems Department, Gdansk University of Technology, ul. Narutowicza 11/12, 80-233, Gdansk, Poland
Andrzej Czyżewski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Siemiński, A. (2010). Verifying Text Similarity Measures for Two Layered Retrieval. In: Nguyen, N.T., Zgrzywa, A., Czyżewski, A. (eds) Advances in Multimedia and Network Information System Technologies. Advances in Intelligent and Soft Computing, vol 80. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14989-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-14989-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14988-7
Online ISBN: 978-3-642-14989-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Verifying Text Similarity Measures for Two Layered Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

General Representation Model for Text Similarity

A Correlation-Based Semantic Model for Text Search

AST Method for Scoring String-to-text Similarity

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Verifying Text Similarity Measures for Two Layered Retrieval

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

General Representation Model for Text Similarity

A Correlation-Based Semantic Model for Text Search

AST Method for Scoring String-to-text Similarity

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation