Abstract
Recently, cluster-based retrieval has been successfully applied to improve retrieval effectiveness. The core part of cluster-based retrieval is inter-document similarities. Although inter-document similarities can be investigated independently of cluster-based retrieval and be further improved in various ways, their direct evaluation has not been seriously considered. Considering that there are many cluster-based retrieval methods, such a direct evaluation method can separate the work of inter-document similarities from the work of cluster-based retrieval. For this purpose, this paper revisits Voorhee’s nearest neighbor test as such a direct evaluation, by mainly focusing on whether or not the test is correlated to the retrieval effectiveness. Experimental results consistently verify the use of the nearest neighbor test. As a result, we conclude that the improvement of retrieval effectiveness can be well-predictable from direct evaluation, even without performing runs of cluster-based retrieval.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kurland, O., Lee, L.: Corpus structure, language models, and ad hoc information retrieval. In: SIGIR 2004, pp. 194–201 (2004)
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: HLT-NAACL 2006, pp. 407–414 (2006)
Rijsbergen, C.J.V.: Information Retrieval. Butterworth-Heinemann (1979)
Calado, P., Cristo, M., Gonçalves, M.A., de Moura, E.S., Ribeiro-Neto, B., Ziviani, N.: Link-based similarity measures for the classification of web documents. Journal of American Society for Information Science and Technology (JASIST) 57(2), 208–221 (2006)
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Representing documents using an explicit model of their similarities, vol. 46, pp. 254–271. John Wiley, New York (1995)
Tombros, A., van Rijsbergen, C.J.: Query-sensitive similarity measures for the calculation of interdocument relationships. In: CIKM 2001, pp. 17–24. ACM, New York (2001)
Voorhees, E.M.: The cluster hypothesis revisited. In: SIGIR 1985, pp. 188–196 (1985)
Hiemstra, D., Robertson, S., Zaragoza, H.: Parsimonious language models for information retrieval. In: SIGIR 2004, pp. 178–185 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Na, SH., Kang, IS., Lee, JH. (2008). Revisit of Nearest Neighbor Test for Direct Evaluation of Inter-document Similarities. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds) Advances in Information Retrieval. ECIR 2008. Lecture Notes in Computer Science, vol 4956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78646-7_77
Download citation
DOI: https://doi.org/10.1007/978-3-540-78646-7_77
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78645-0
Online ISBN: 978-3-540-78646-7
eBook Packages: Computer ScienceComputer Science (R0)