Abstract
We have conducted a study to: (1) verify the exhaustiveness of pooling for the purpose of constructing a large-scale test collection, and (2) examine whether a difference in the number of pool documents can affect the relative evaluation of IR systems. We carried out the experiments using search topics, their relevance assessments, and the search results that were submitted for both the pre-test and test of the first NTCIR Workshop.
Our results verified the efficiency and the effectiveness of the pooling method, the exhaustiveness of the relevance assessments, and the reliability of the evaluation using the test collection based on the pooling method.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Buckley C and Voorhees E (1999) Tutorial: Theory and practice in text retrieval system evaluation. In: Tutorial in ACM-SIGIR'99, Berkeley, CA, USA, pp. 1–109.
Cormack GV, Palmer CR and Clarke CLA (1998) Efficient construction of large test collections. In: Proceedings of the ACM-SIGIR'98, Melbourne, Australia, pp. 282–289.
Gilbert G and Sparck Jones K (1979) Statistical bases of relevance assessment for the ‘Ideal’ information retrieval test collection. BL R&D Report 5481, Cambridge, England.
Kageura K, Koyama T, Yoshioka M, Takasu A, Nozue T and Tsuji K (1997) NACSIS corpus project for IR and terminological research. In: Proceedings of the Natural Language Processing Pacific Rim Symposium 1997, Phuket, Thailand, pp. 493–496.
Kando N, Kuriyama K and Nozue T (1999a) NTCIR-1 (NACSIS Test Collection for Information Retrieval Systems-1): Its Policy and Practice. IPSJ SIG Notes, 99-FI-53-5:33–40. (In Japanese).
Kando N, Kuriyama K, Nozue T, Eguchi K, Kato H and Hidaka S (1999b) Overview of IR tasks at the first NTCIR workshop. In: Proceedings of the NTCIR Workshop 1, Tokyo, Japan, pp. 11–44.
Kando N and Nozue T (1999), Eds. NTCIR Workshop 1: Proceedings of the First NTCIR Workshop on Retrieval in Japanese Text Retrieval and Term Recognition, Tokyo, Japan. http://research.nii.ac.jp/ntcir/ workshop/OnlineProceedings/(visited March 24th, 2001).
Kuriyama K, Eguchi K, Nozue T and Kando N (1999) NACSIS test collection for information retrieval systems-1 (1): Analysis of the pooling and the relevance assessments. In: Proceedings of the IPSJ Annual Meeting, Morioka, Japan, pp. 3,105-106. (In Japanese).
NTCIR (NACSIS Test Collection for IR Systems) Project. http://research.nii.ac.jp/ntcir/ (visited March 24th, 2001).
Voorhees EM (1998) Variations in relevance judgments and the measurement of retrieval effectiveness. In: Proceedings of the ACM-SIGIR'98, Melbourne, Australia, pp. 315–332.
Voorhees EM and Harman D (2000), Eds. The Eighth Text REtrieval Conference (TREC-8), NIST Special Publication 500-246, Maryland, U.S.A., Text REtrieval Conference (TREC). http://trec.nist.gov/(visited March 20th, 2001).
Zobel J (1998) How reliable are the results of large scale information retrieval experiments? In: Proceedings of the ACM-SIGIR'98, Melbourne, Australia, pp. 307–314.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Kuriyama, K., Kando, N., Nozue, T. et al. Pooling for a Large-Scale Test Collection: An Analysis of the Search Results from the First NTCIR Workshop. Information Retrieval 5, 41–59 (2002). https://doi.org/10.1023/A:1012778807438
Issue Date:
DOI: https://doi.org/10.1023/A:1012778807438