[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1008992.1009000acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Retrieval evaluation with incomplete information

Published: 25 July 2004 Publication History

Abstract

This paper examines whether the Cranfield evaluation methodology is robust to gross violations of the completeness assumption (i.e., the assumption that all relevant documents within a test collection have been identified and are present in the collection). We show that current evaluation measures are not robust to substantially incomplete relevance judgments. A new measure is introduced that is both highly correlated with existing measures when complete judgments are available and more robust to incomplete judgment sets. This finding suggests that substantially larger or dynamic test collections built using current pooling practices should be viable laboratory tools, despite the fact that the relevance information will be incomplete and imperfect.

References

[1]
Chris Buckley. trec_eval IR evaluation package. Available from ftp://ftp.cs.cornell.edu/pub/smart.
[2]
Chris Buckley and EllenM. Voorhees. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 33--40, 2000.
[3]
Cyril W. Cleverdon. The significance of the Cranfield tests on index languages. In Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Reserach and Development in Information Retrieval, pages 3--12, 1991.
[4]
Gordon V. Cormack, ChristopherR. Palmer, and CharlesL.A. Clarke. Efficient construction of large test collections. In Croft et al. sigir98, pages 282--289.
[5]
W. Bruce Croft, Alistair Moffat, C. J. van Rijsbergen, Ross Wilkinson, and Justin Zobel, editors. Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998. ACM Press, New York.
[6]
M. B. Eisenberg. Measuring relevance judgments. Information Processing and Management, 24(4):373--389, 1988.
[7]
H. P. Frei and P. Schäuble. Determining the effectiveness of retrieval algorithms. Information Processing and Management, 27(2/3):153--164, 1991.
[8]
Google. Benefits of a Google search. http://www.google.com/technology/whyuse.html, January 2004.
[9]
Stefano Mizzaro. A new measure of retrieval effectiveness(Or: What's wrong with precision and recall). In Proceedings of the International Workshop on Information Retrieval(IR'2001), pages 43--52, 2001.
[10]
Rabia Nuray and Fazli Can. Automatic ranking of retrieval systems in imperfect environments. In Proceedings of the Twenty-Sixth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR 2003), pages 379--380, 2003.
[11]
Mark E. Rorvig. The simple scalability of documents. Journal of the American Society for Information Science, 41(8):590--598, 1990.
[12]
Ian Soboroff, Charles Nicholas, and Patrick Cahan. Ranking retrieval systems without relevance judgments. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 66--73, 2001.
[13]
K. Sparck Jones and C. van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.
[14]
C. J. van Rijsbergen. Evaluation, chapter 7. Butterworths, 2 edition, 1979.
[15]
Ellen M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing and Management, 36:697--716, 2000.
[16]
Ellen M. Voorhees. Evaluation by highly relevant documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 74--82, 2001.
[17]
Ellen M. Voorhees. The philosophy of information retrieval evaluation. In Evaluation of Cross-Language Information Retrieval Systems. Proceedings of CLEF 2001, number 2406 in Lecture Notes in Computer Science, pages 355--370, 2002.
[18]
Ellen M. Voorhees and Chris Buckley. The effect of topic set size on retrieval experiment error. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 316--323, 2002.
[19]
Ellen M. Voorhees and Donna Harman. Overview of the seventh Text REtrieval Conference(TREC-7). In Proceedings of the Seventh Text REtrieval Conference(TREC-7), pages 1--23, 1999. NIST Special Publication 500--242.
[20]
Y. Y. Yao. Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science, 46(2):133--145, 1995.
[21]
Justin Zobel. How reliable are the results of large-scale information retrieval experiments? In Croft et al. {5}, pages 307--314.

Cited By

View all
  • (2024)Mathematical Information Retrieval: A ReviewACM Computing Surveys10.1145/369995357:3(1-34)Online publication date: 9-Oct-2024
  • (2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
  • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
  • Show More Cited By

Index Terms

  1. Retrieval evaluation with incomplete information

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2004
    624 pages
    ISBN:1581138814
    DOI:10.1145/1008992
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cranfield
    2. incomplete judgments

    Qualifiers

    • Article

    Conference

    SIGIR04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)90
    • Downloads (Last 6 weeks)15
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Mathematical Information Retrieval: A ReviewACM Computing Surveys10.1145/369995357:3(1-34)Online publication date: 9-Oct-2024
    • (2024)Evaluation of Temporal Change in IR Test CollectionsProceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3664190.3672530(3-13)Online publication date: 2-Aug-2024
    • (2024)What Matters in a Measure? A Perspective from Large-Scale Search EvaluationProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657845(282-292)Online publication date: 10-Jul-2024
    • (2024)FaN-REMs: Fair and Normalized Retrieval Evaluation Metrics for Learning Retrieval SystemsIEEE Access10.1109/ACCESS.2024.351491612(195370-195395)Online publication date: 2024
    • (2024)Sentiment Analysis Principle Technical Approach on Online Social Network Data Using CNN for Detection of StressProceedings of Fifth International Conference on Computer and Communication Technologies10.1007/978-981-99-9704-6_37(401-410)Online publication date: 14-Feb-2024
    • (2024)Entity Set Expansion Based on Category Prompts in MOOCsKnowledge Science, Engineering and Management10.1007/978-981-97-5495-3_24(318-332)Online publication date: 26-Jul-2024
    • (2024)Navigating the Evaluation Funnel to Optimize Iteration Speed for Recommender SystemsProceedings of the Future Technologies Conference (FTC) 2024, Volume 110.1007/978-3-031-73110-5_11(138-157)Online publication date: 5-Nov-2024
    • (2024)Replicability Measures for Longitudinal Information Retrieval EvaluationExperimental IR Meets Multilinguality, Multimodality, and Interaction10.1007/978-3-031-71736-9_16(215-226)Online publication date: 14-Sep-2024
    • (2024)The Effectiveness of Graph Contrastive Learning on Mathematical Information RetrievalAdvances on Graph-Based Approaches in Information Retrieval10.1007/978-3-031-71382-8_5(60-72)Online publication date: 10-Oct-2024
    • (2024)An Intrinsic Framework of Information Retrieval Evaluation MeasuresIntelligent Systems and Applications10.1007/978-3-031-47721-8_47(692-713)Online publication date: 10-Jan-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media