Article

Meta-scoring: automatically evaluating term weighting schemes in IR without precision-recall

Authors:

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 83 - 89

https://doi.org/10.1145/383952.383964

Published: 01 September 2001 Publication History

Get Access

Abstract

In this paper, we present a method that can automatically evaluate performance of different term weighting schemes in information retrieval without resorting to precision-recall based on human relevance judgments. Specifically, the problem is: given two document-term matrixes generated from two different term weighting schemes, can we tell which term weighting scheme will performance better than the other? We propose a meta-scoring function, which takes as input the document-term matrix generated by some term weighting scheme and computes a goodness score from the document-term matrix. In our experiments, we found out that this score is highly correlated with the precision-recall measurement for all the collections and term weighting schema we tried. Thus, we conclude that our meta-scoring function can be a substitute for the precision-recall measurement that needs relevance judgments of human subject. Furthermore, this meta-scoring function is not limited only to text information retrieval can be applied to fields such as image and DNA retrieval.

References

[1]

K. Sparck Jones and P. Willett. Reading in Information Retrieval. Chap. 3, 305-312, Morgan Kaufmann Publishers, San Francisco, CA, 1997.

Digital Library

Google Scholar

[2]

C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using SMART. In D. K. Harmann, editor, Proceedings of the Fourth Text Retrieval Conference (TREC-4), Gaithersburg, 1996.

Google Scholar

[3]

S.E. Roberson and S. Walker, Okapi/Keenbow at TREC-8. In E.M. Voorhees and D.K. Harmann, editor, Proceedings of the Eighth Text Retrieval Conference (TREC-8), Gaithersburg, 2000.

Google Scholar

[4]

K. Sparck Jones and C. van Rijsbergen. Report on the need for and provision of an "ideal" information retrieval test collection. British Library Research and Development Report 5266, Computer Laboratory, University of Cambridge, 1975.

Google Scholar

[5]

S. Mizzaro, Measuring the Agreement Among Relevance Judges, MIRA99, Glasgow, UK, 1999.

Google Scholar

[6]

G. Salton and C. Buckley, Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management, 24, 513-523, 1988.

Digital Library

Google Scholar

[7]

S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science, 41(6): 391-407, Sept. 1990.

Crossref

Google Scholar

[8]

K.L. Kwok, L. Grunfeld and J.H. Xu, TREC-6 English and Chinese Retrieval Experiments using PRICS. In E. M. Voorhees and D. K. Harmann, editor, Proceedings of the Sixth Text Retrieval Conference (TREC-6), Gaithersburg, 1997.

Google Scholar

[9]

T.M. Cover and J.A. Thomas. Elements of Information Theory. John Wiley & Sons, New York, 1991.

Digital Library

Google Scholar

[10]

W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 1993.

Digital Library

Google Scholar

[11]

C.H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent Semantic Indexing: A Probabilistic Analysis. In Proocedings of the ACM Conference, 1999.

Google Scholar

Cited By

View all

Zhang ZChen LYin FZhang XGuo L(2020)Improving Online Clustering of Chinese Technology Web News With Bag-of-Near-SynonymsIEEE Access10.1109/ACCESS.2020.29955168(94245-94257)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2995516
(2018)Experimental analysis of impact of term weighting schemes on cluster qualityInternational Journal of Advanced Intelligence Paradigms10.5555/3192120.319213210:1-2(178-193)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3192120.3192132
Damasevicius RValys RWozniak M(2016)Intelligent tagging of online texts using fuzzy logic2016 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2016.7849917(1-8)Online publication date: Dec-2016
https://doi.org/10.1109/SSCI.2016.7849917
Show More Cited By

Index Terms

Meta-scoring: automatically evaluating term weighting schemes in IR without precision-recall
1. Information systems
  1. Information retrieval

Recommendations

Enhancing relevance scoring with chronological term rank
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

We introduce a new relevance scoring technique that enhances existing relevance scoring schemes with term position information. This technique uses chronological term rank (CTR) which captures the positions of terms as they occur in the sequence of ...
Focused retrieval with proximity scoring
SAC '10: Proceedings of the 2010 ACM Symposium on Applied Computing

We present in this paper a scoring method for information retrieval based on the proximity of the query terms in the documents. The idea of the method first is to assign to each position in the document a fuzzy proximity value depending on its closeness ...
Re-ranking by local re-scoring for video indexing and retrieval
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Video retrieval can be done by ranking the samples according to their probability scores that were predicted by classifiers. It is often possible to improve the retrieval performance by re-ranking the samples. In this paper, we proposed a re-ranking ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

September 2001

454 pages

ISBN:1581133316

DOI:10.1145/383952

Chairmen:
Donald H. Kraft
Louisiana State Univ.
,
W. Bruce Croft
University of Massachusetts, (For the Americas)
,
David J. Harper
The Robert Gordon University, (For Europe and Africa)
,
Justin Zobel
RMIT University, (For Asia and Australasia)

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2001

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGIR01

Sponsor:

SIGIR

SIGIR01: 24th ACM/SIGIR International Conference on Research and Development in Information Retrieval

Louisiana, New Orleans, USA

Acceptance Rates

SIGIR '01 Paper Acceptance Rate 47 of 201 submissions, 23%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
970
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang ZChen LYin FZhang XGuo L(2020)Improving Online Clustering of Chinese Technology Web News With Bag-of-Near-SynonymsIEEE Access10.1109/ACCESS.2020.29955168(94245-94257)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.2995516
(2018)Experimental analysis of impact of term weighting schemes on cluster qualityInternational Journal of Advanced Intelligence Paradigms10.5555/3192120.319213210:1-2(178-193)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3192120.3192132
Damasevicius RValys RWozniak M(2016)Intelligent tagging of online texts using fuzzy logic2016 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2016.7849917(1-8)Online publication date: Dec-2016
https://doi.org/10.1109/SSCI.2016.7849917
Ibrahim OLanda-Silva D(2016)Term frequency with average term occurrences for textual information retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-015-1935-720:8(3045-3061)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s00500-015-1935-7
Lioma CSimonsen JLarsen BHansen NBaeza-Yates RLalmas MMoffat ARibeiro-Neto B(2015)Non-Compositional Term Dependence for Information RetrievalProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767717(595-604)Online publication date: 9-Aug-2015
https://dl.acm.org/doi/10.1145/2766462.2767717
Ibrahim OLanda-Silva D(2014)A new weighting scheme and discriminative approach for information retrieval in static and dynamic document collections2014 14th UK Workshop on Computational Intelligence (UKCI)10.1109/UKCI.2014.6930160(1-8)Online publication date: Sep-2014
https://doi.org/10.1109/UKCI.2014.6930160
Sarnikar SZhang ZZhao J(2014)Query-performance prediction for effective query routing in domain-specific repositoriesJournal of the Association for Information Science and Technology10.1002/asi.2307265:8(1597-1614)Online publication date: 1-Aug-2014
https://dl.acm.org/doi/10.1002/asi.23072
Whissell JClarke CHe QIyengar ANejdl WPei JRastogi R(2013)Effective measures for inter-document similarityProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505526(1361-1370)Online publication date: 27-Oct-2013
https://dl.acm.org/doi/10.1145/2505515.2505526
ajgalík MBarla MBieliková M(2013)From Ambiguous Words to Key-Concept ExtractionProceedings of the 2013 24th International Workshop on Database and Expert Systems Applications10.1109/DEXA.2013.16(63-67)Online publication date: 26-Aug-2013
https://dl.acm.org/doi/10.1109/DEXA.2013.16
Verma KJadon MPujari A(2013)Clustering Short-Text Using Non-negative Matrix Factorization of Hadamard Product of SimilaritiesInformation Retrieval Technology10.1007/978-3-642-45068-6_13(145-155)Online publication date: 2013
https://doi.org/10.1007/978-3-642-45068-6_13
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Enhancing relevance scoring with chronological term rank

Focused retrieval with proximity scoring

Re-ranking by local re-scoring for video indexing and retrieval

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations