More Web Proxy on the site http://driver.im/

Article

Regularizing ad hoc retrieval scores

Author:

Fernando DiazAuthors Info & Claims

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Pages 672 - 679

https://doi.org/10.1145/1099554.1099722

Published: 31 October 2005 Publication History

Abstract

The cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting ad hoc retrieval scores from an initial retrieval so that topically related documents receive similar scores. We refer to this process as score regularization. Score regularization can be presented as an optimization problem, allowing the use of results from semi-supervised learning. We demonstrate that regularized scores consistently and significantly rank documents better than unregularized scores, given a variety of initial retrieval algorithms. We evaluate our method on two large corpora across a substantial number of topics.

References

[1]

J. Allan, J. Callan, K. Collins-Thompson, B. Croft, F. Feng, D. Fisher, J. Lafferty, L. Larkey, T. N. Truong, P. Ogilvie, L. Si, T. Strohman, H. Turtle, L. Yau, and C. Zhai. The lemur toolkit for language modeling and information retrieval. http://lemurproject.org.]]

[2]

R. K. Belew. Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents. In SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, pages 11--20, New York, NY, USA, 1989. ACM Press.]]

Digital Library

[3]

M. Belkin and P. Niyogi. Semi-supervised learning on riemannian manifolds. Mach. Learn., 56(1-3):209--239, 2004.]]

Digital Library

[4]

W. B. Croft and J. Lafferty. Language Modeling for Information Retrieval. Kluwer Academic Publishing, 2003.]]

Digital Library

[5]

W. B. Croft, T. J. Lucia, and P. R. Cohen. Retrieving documents by plausible inference: a priliminary study. In SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages 481--494, New York, NY, USA, 1988. ACM Press.]]

Digital Library

[6]

D. K. Harman. The first text retrieval conference (trec-1) rockville, md, u.s.a., 4-6 november, 1992. Inf. Process. Manage., 29(4):411--414, 1993.]]

Digital Library

[7]

N. Jardine and C. J. V. Rijsbergen. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval, 7:217--240, 1971.]]

[8]

R. Krovetz. Viewing morphology as an inference process. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 191--202, New York, NY, USA, 1993. ACM Press.]]

Digital Library

[9]

O. Kurland and L. Lee. Corpus structure, language models, and ad hoc information retrieval. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 194--201, New York, NY, USA, 2004. ACM Press.]]

Digital Library

[10]

O. Kurland and L. Lee. Pagerank without hyperlinks: Structural re-ranking using links induced by language models. In SIGIR '05: Proceedings of the 28th annual international conference on Research and development in information retrieval, 2005.]]

Digital Library

[11]

K. L. Kwok. A neural network for probabilistic information retrieval. In SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--30, New York, NY, USA, 1989. ACM Press.]]

Digital Library

[12]

J. Lafferty and G. Lebanon. Diffusion kernels on statistical manifolds. J. Mach. Learn. Res., 6:129--163, 2005.]]

Digital Library

[13]

V. Lavrenko. A Generative Theory of Relevance. PhD thesis, University of Massachusetts, 2004.]]

Digital Library

[14]

X. Liu and W. B. Croft. Cluster-based retrieval using language models. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 186--193, New York, NY, USA, 2004. ACM Press.]]

Digital Library

[15]

I. Matveeva. Text representation with the locality preserving projection algorithm for information retrieval task. Master's thesis, University of Chicago, 2004.]]

[16]

A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.]]

[17]

D. Metzler and W. B. Croft. Combining the language model and inference network approaches to retrieval. Inf. Process. Manage., 40(5):735--750, 2004.]]

Digital Library

[18]

T. Qin, T.-Y. Liu, X.-D. Zhang, Z. Chen, and W.-Y. Ma. A study of relevance propagation for web search. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 408--415, New York, NY, USA, 2005. ACM Press.]]

Digital Library

[19]

S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232--241, New York, NY, USA, 1994. Springer-Verlag New York, Inc.]]

Digital Library

[20]

J. J. Rocchio. The SMART Retrieval System: Experiments in Automatic Document Processing, chapter Relevance Feedback in Information Retrieval, pages 313--323. Prentice-Hall Inc., 1971.]]

[21]

G. Salton and C. Buckley. On the use of spreading activation methods in automatic information. In SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages 147--160, New York, NY, USA, 1988. ACM Press.]]

Digital Library

[22]

H. Turtle and W. B. Croft. Inference networks for document retrieval. In SIGIR '90: Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 1--24, New York, NY, USA, 1990. ACM Press.]]

Digital Library

[23]

U. von Luxburg, O. Bousquet, and M. Belkin. On the convergence of spectral clustering on random samples: The normalized case. In Proceedings of the 17th Annual Conference on Learning Theory, pages 457--471, Berlin, 2004. Springer.]]

[24]

E. Voorhees. Overview of the trec 2004 robust track. In Proceedings of the 13th Text REtrieval Conference (TREC 2004), 2004.]]

[25]

R. Wilkinson and P. Hingston. Using the cosine measure in a neural network for document retrieval. In SIGIR '91: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, pages 202--210, New York, NY, USA, 1991. ACM Press.]]

Digital Library

[26]

J. Xu and W. B. Croft. Cluster-based language models for distributed retrieval. In SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 254--261, New York, NY, USA, 1999. ACM Press.]]

Digital Library

[27]

D. Zhou, J. Weston, A. Gretton, O. Bousquet, and B. Schölkopf. Ranking on data manifolds. In L. S. Thrun, S. and B. Scholkopf, editors, Advances in Neural Information Processing Systems 16, volume 16, pages 169--176, Cambridge, MA, USA, 2004. MIT Press.]]

[28]

X. Zhu. Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University, 2005. CMU-LTI-05-192.]]

Digital Library

Cited By

Cachel KRundensteiner ESerra ESpezzano F(2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679649
Hambarde KProença H(2023)Information Retrieval: Recent Advances and BeyondIEEE Access10.1109/ACCESS.2023.329577611(76581-76604)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3295776
Zamani HBendersky MMetzler DZhuang HWang XCrestani FPasi GGaussier E(2022)Stochastic Retrieval-Conditioned RerankingProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545141(81-91)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545141
Show More Cited By

Index Terms

Regularizing ad hoc retrieval scores
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
    2. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Regularizing query-based retrieval scores
Abstract
We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this ...
Improving zero-shot retrieval using dense external expansion
Abstract
Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on ...
Highlights
- Dense external expansion improves zero-shot retrieval performance.
- High quality ...
Document expansion for image retrieval
RIAO '10: Adaptivity, Personalization and Fusion of Heterogeneous Information

Successful information retrieval requires effective matching between the user's search request and the contents of relevant documents. Often the request entered by a user may not use the same topic relevant terms as the authors' of these documents. One ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

October 2005

854 pages

ISBN:1595931406

DOI:10.1145/1099554

General Chair:
Otthein Herzog
University of Bremen, Germany
,
Program Chairs:
Hans-Jörg Schek
University for Health Sciences, Medical Informatics and Technology, Austria
,
Norbert Fuhr
University of Duisburg-Essen, Germany
,
Abdur Chowdhury
America Online, USA
,
Wilfried Teiken
IBM T.J. Watson Research Center, USA

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIKM05

Sponsor:

CIKM05: Conference on Information and Knowledge Management

October 31 - November 5, 2005

Bremen, Germany

Acceptance Rates

CIKM '05 Paper Acceptance Rate 77 of 425 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

69
Total Citations
View Citations
614
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)3

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cachel KRundensteiner ESerra ESpezzano F(2024)Wise Fusion: Group Fairness Enhanced Rank FusionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679649(163-174)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679649
Hambarde KProença H(2023)Information Retrieval: Recent Advances and BeyondIEEE Access10.1109/ACCESS.2023.329577611(76581-76604)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3295776
Zamani HBendersky MMetzler DZhuang HWang XCrestani FPasi GGaussier E(2022)Stochastic Retrieval-Conditioned RerankingProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545141(81-91)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545141
Guo JCai YFan YSun FZhang RCheng X(2022)Semantic Models for the First-Stage Retrieval: A Comprehensive ReviewACM Transactions on Information Systems10.1145/348625040:4(1-42)Online publication date: 24-Mar-2022
https://dl.acm.org/doi/10.1145/3486250
Han XLiu YLin JHasibi FFang YAizawa A(2021)The Simplest Thing That Can Possibly Work: (Pseudo-)Relevance Feedback via Text ClassificationProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3471158.3472261(123-129)Online publication date: 11-Jul-2021
https://dl.acm.org/doi/10.1145/3471158.3472261
Jeong MChoi SYeo JHwang S(2021)Label and Context Augmentation for Response Selection at DSTC8IEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2021.307687629(2541-2550)Online publication date: 2021
https://doi.org/10.1109/TASLP.2021.3076876
Raiber FKurland O(2019)Relevance FeedbackACM Transactions on Information Systems10.1145/336048737:4(1-28)Online publication date: 4-Oct-2019
https://dl.acm.org/doi/10.1145/3360487
Roitman HRabinovich ESar Shalom OLee DSastry NWeber I(2018)As Stable As You AreProceedings of the 29th on Hypertext and Social Media10.1145/3209542.3209567(33-37)Online publication date: 3-Jul-2018
https://dl.acm.org/doi/10.1145/3209542.3209567
Liang SMarkov IRen Zde Rijke MChampin PGandon FMédini LLalmas MIpeirotis P(2018)Manifold Learning for Rank AggregationProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186085(1735-1744)Online publication date: 10-Apr-2018
https://dl.acm.org/doi/10.1145/3178876.3186085
Levi OGuy IRaiber FKurland O(2018)Selective Cluster Presentation on the Search Results PageACM Transactions on Information Systems10.1145/315867236:3(1-42)Online publication date: 28-Feb-2018
https://dl.acm.org/doi/10.1145/3158672
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten