Revisit of Nearest Neighbor Test for Direct Evaluation of Inter-document Similarities

Seung-Hoon Na¹,
In-Su Kang² &
Jong-Hyeok Lee¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4956))

Included in the following conference series:

European Conference on Information Retrieval

2210 Accesses

Abstract

Recently, cluster-based retrieval has been successfully applied to improve retrieval effectiveness. The core part of cluster-based retrieval is inter-document similarities. Although inter-document similarities can be investigated independently of cluster-based retrieval and be further improved in various ways, their direct evaluation has not been seriously considered. Considering that there are many cluster-based retrieval methods, such a direct evaluation method can separate the work of inter-document similarities from the work of cluster-based retrieval. For this purpose, this paper revisits Voorhee’s nearest neighbor test as such a direct evaluation, by mainly focusing on whether or not the test is correlated to the retrieval effectiveness. Experimental results consistently verify the use of the nearest neighbor test. As a result, we conclude that the improvement of retrieval effectiveness can be well-predictable from direct evaluation, even without performing runs of cluster-based retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Pairwise document similarity measure based on present term set

Article Open access 26 December 2018

Context Semantic Analysis: A Knowledge-Based Technique for Computing Inter-document Similarity

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

Article 30 October 2019

References

Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: SIGIR 2004, pp. 194–201 (2004)
Google Scholar
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: HLT-NAACL 2006, pp. 407–414 (2006)
Google Scholar
Rijsbergen, C.J.V.: Information Retrieval. Butterworth-Heinemann (1979)
Google Scholar
Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Ribeiro-Neto, B., Ziviani, N.: Link-based similarity measures for the classification of web documents. Journal of American Society for Information Science and Technology (JASIST) 57(2), 208–221 (2006)
Article Google Scholar
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Representing documents using an explicit model of their similarities, vol. 46, pp. 254–271. John Wiley, New York (1995)
Google Scholar
Tombros, A., van Rijsbergen, C.J.: Query-sensitive similarity measures for the calculation of interdocument relationships. In: CIKM 2001, pp. 17–24. ACM, New York (2001)
Chapter Google Scholar
Voorhees, E.M.: The cluster hypothesis revisited. In: SIGIR 1985, pp. 188–196 (1985)
Google Scholar
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: SIGIR 2004, pp. 178–185 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

POSTECH,Pohang, South Korea
Seung-Hoon Na & Jong-Hyeok Lee
KISTI,Daejeon, South Korea
In-Su Kang

Authors

Seung-Hoon Na
View author publications
You can also search for this author in PubMed Google Scholar
In-Su Kang
View author publications
You can also search for this author in PubMed Google Scholar
Jong-Hyeok Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Craig Macdonald Iadh Ounis Vassilis Plachouras Ian Ruthven Ryen W. White

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Na, SH., Kang, IS., Lee, JH. (2008). Revisit of Nearest Neighbor Test for Direct Evaluation of Inter-document Similarities. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_77

Download citation

DOI: https://doi.org/10.1007/978-3-540-78646-7_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Revisit of Nearest Neighbor Test for Direct Evaluation of Inter-document Similarities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Pairwise document similarity measure based on present term set

Context Semantic Analysis: A Knowledge-Based Technique for Computing Inter-document Similarity

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Revisit of Nearest Neighbor Test for Direct Evaluation of Inter-document Similarities

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Pairwise document similarity measure based on present term set

Context Semantic Analysis: A Knowledge-Based Technique for Computing Inter-document Similarity

A comparative study of data-dependent approaches without learning in measuring similarities of data objects

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation