[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1277741.1277805acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Test theory for assessing IR test collections

Published: 23 July 2007 Publication History

Abstract

How good is an IR test collection? A series of papers in recent years has addressed the question by empirically enumerating the consistency of performance comparisons using alternate subsets of the collection. In this paper we propose using Test Theory, which is based on analysis of variance and is specifically designed to assess test collections. Using the method, we not only can measure test reliability after the fact, but we can estimate the test collection's reliability before it is even built or used. We can also determine an optimal allocation of resources before the fact, e.g. whether to invest in more judges or queries. The method, which is in widespread use in the field of educational testing, complements data-driven approaches to assessing test collections. Whereas the data-driven method focuses on test results, test theory focuses on test designs. It offers unique practical results, as well as insights about the variety and implications of alternative test designs.

References

[1]
Zobel, J., How reliable are the results of large-scale information retrieval experiments?, in Proceedings of the 21st ACM SIGIR Conference on Research and Development In Information Retrieval. 1998, ACM: Melbourne.
[2]
Voorhees, E. Variations in Relevance Judgments and the Measurement of Retrieval Effectiveness. in Proceedings of the 21st ACM SIGIR Conference on Research and Development In Information Retrieval. 1998. Melbourne: ACM Press.
[3]
Buckley, C. and E. Voorhees. Evaluating Evaluation Measure Stability. in Proceedings of the 23rd ACM SIGIR conference on Research and Development in Information Retrieval. 2000. Athens, Greece: ACM Press.
[4]
Voorhees, E.M. and C. Buckley. The Effect of Topic Set Size on Retrieval Experiment Error. in Proceedings of the 25th ACM SIGIR Conference on Research and Development In Information Retrieval. 2002. Tampere: ACM Press.
[5]
Sanderson, M. and J. Zobel, Information retrieval system evaluation: effort, sensitivity, and reliability, in Proceedings of the 28th ACM SIGIR Conference on Research and Development in Information Retrieval. 2005, ACM: Salvador, Brazil.
[6]
Banks, D., P. Over, and N.F. Zhang, Blind men and elephants: six approaches to TREC data. Information Retrieval, 1999. 1(1-2): p. 7--34.
[7]
Lange, R., et al., A probabilistic Rasch analysis of question answering evaluation, in Proceedings of Human Language Technology conference North American chapter of the Association for Computational Linguistics annual meeting, HLT/NAACL 2004. 2004, Association for Computational Linguistics: Boston.
[8]
Crocker, L. and J. Algina, Introduction to Classical & Modern Test Theory. 1986: Holt, Rinehart, and Winston.
[9]
Brennan, R.L., Generalizability Theory. Statistics for Social Science and Public Policy, ed. S.E.D. Lievesley and J.R. Feinberg. 2001: Springer-Verlag.
[10]
Shavelson, R.J. and N.M. Webb, Generalizability Theory: A Primer. 1991, Newbury Park, CA: Sage.
[11]
Gao, X. and L. Brennan, Variability of Estimated Variance Components and Related Statistics in a Performance Assessment. Applied Measurement in Education, 2001. 14(2): p. 191--203.

Cited By

View all
  • (2022)Detecting Significant Differences Between Information Retrieval Systems via Generalized Linear ModelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557286(446-456)Online publication date: 17-Oct-2022
  • (2022)IR Evaluation and Learning in the Presence of Forbidden DocumentsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532006(556-566)Online publication date: 6-Jul-2022
  • (2021)Topic Difficulty: Collection and Query Formulation EffectsACM Transactions on Information Systems10.1145/347056340:1(1-36)Online publication date: 8-Sep-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
July 2007
946 pages
ISBN:9781595935977
DOI:10.1145/1277741
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information retrieval
  2. test collections
  3. test theory

Qualifiers

  • Article

Conference

SIGIR07
Sponsor:
SIGIR07: The 30th Annual International SIGIR Conference
July 23 - 27, 2007
Amsterdam, The Netherlands

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)2
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Detecting Significant Differences Between Information Retrieval Systems via Generalized Linear ModelsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557286(446-456)Online publication date: 17-Oct-2022
  • (2022)IR Evaluation and Learning in the Presence of Forbidden DocumentsProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3532006(556-566)Online publication date: 6-Jul-2022
  • (2021)Topic Difficulty: Collection and Query Formulation EffectsACM Transactions on Information Systems10.1145/347056340:1(1-36)Online publication date: 8-Sep-2021
  • (2021)System Effect Estimation by Sharding: A Comparison Between ANOVA Approaches to Detect Significant DifferencesAdvances in Information Retrieval10.1007/978-3-030-72240-1_3(33-46)Online publication date: 30-Mar-2021
  • (2020)The Impact of Negative Relevance Judgments on NDCGProceedings of the 29th ACM International Conference on Information & Knowledge Management10.1145/3340531.3412123(2037-2040)Online publication date: 19-Oct-2020
  • (2019)Improving the Accuracy of System Performance Estimation by Using ShardsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3338062(805-814)Online publication date: 18-Jul-2019
  • (2019)Using Collection Shards to Study Retrieval Performance Effect SizesACM Transactions on Information Systems10.1145/331036437:3(1-40)Online publication date: 19-Mar-2019
  • (2019)In quest of new document relationsScientometrics10.1007/s11192-019-03058-3119:2(987-1008)Online publication date: 1-May-2019
  • (2019)Fewer topics? A million topics? Both?! On topics subsets in test collectionsInformation Retrieval Journal10.1007/s10791-019-09357-wOnline publication date: 8-May-2019
  • (2018)When to stop making relevance judgments? A study of stopping methods for building information retrieval test collectionsJournal of the Association for Information Science and Technology10.1002/asi.2407770:1(49-60)Online publication date: 12-Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media