More Web Proxy on the site http://driver.im/

Article

Two-Stage learning to rank for information retrieval

Authors:

Michael Bendersky,

W. Bruce CroftAuthors Info & Claims

ECIR'13: Proceedings of the 35th European conference on Advances in Information Retrieval

Pages 423 - 434

https://doi.org/10.1007/978-3-642-36973-5_36

Published: 24 March 2013 Publication History

Abstract

Current learning to rank approaches commonly focus on learning the best possible ranking function given a small fixed set of documents. This document set is often retrieved from the collection using a simple unsupervised bag-of-words method, e.g. BM25. This can potentially lead to learning a sub-optimal ranking, since many relevant documents may be excluded from the initially retrieved set. In this paper we propose a novel two-stage learning framework to address this problem. We first learn a ranking function over the entire retrieval collection using a limited set of textual features including weighted phrases, proximities and expansion terms. This function is then used to retrieve the best possible subset of documents over which the final model is trained using a larger set of query- and document-dependent features. Empirical evaluation using two web collections unequivocally demonstrates that our proposed two-stage framework, being able to learn its model from more relevant documents, outperforms current learning to rank approaches.

References

[1]

Liu, T.Y.: Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3(3), 225-331 (2009)

Digital Library

[2]

Metzler, D., Croft, W. B.: Linear feature-based models for information retrieval. Information Retrieval 10(3), 257-274 (2007)

Digital Library

[3]

Liu, T.Y., Xu, J., Qin, T., Xiong, W., Li, H.: LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. In: SIGIR (2007)

[4]

Bendersky, M., Metzler, D., Croft, W. B.: Effective query formulation with multiple information sources. In: WSDM, pp. 443-452 (2012)

Digital Library

[5]

Bendersky, M., Metzler, D., Croft, W. B.: Learning concept importance using a weighted dependence model. In: WSDM, pp. 31-40 (2010)

Digital Library

[6]

Metzler, D., Croft, W. B.: A Markov random field model for term dependencies. In: SIGIR, pp. 472-479 (2005)

Digital Library

[7]

Peng, J., Macdonald, C., He, B., Plachouras, V., Ounis, I.: Incorporating term dependency in the DFR framework. In: SIGIR, pp. 843-844 (2007)

Digital Library

[8]

Lu, Y., Peng, F., Mishne, G., Wei, X., Dumoulin, B.: Improving Web search relevance with semantic features. In: EMNLP, pp. 648-657 (2009)

Digital Library

[9]

Zhu, M., Shi, S., Li, M., Wen, J. R.: Effective top-k computation in retrieving structured documents with term-proximity support. In: CIKM, pp. 771-780 (2007)

Digital Library

[10]

Tonellotto, N., Macdonald, C., Ounis, I.: Efficient dynamic pruning with proximity support. In: LSDS-IR (2010)

[11]

Burges, C. J.C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G. N.: Learning to rank using gradient descent. In: ICML, pp. 89-96 (2005)

Digital Library

[12]

Burges, C. J.C., Ragno, R., Le, Q. V.: Learning to Rank with Nonsmooth Cost Functions. In: NIPS, pp. 193-200 (2006)

Digital Library

[13]

Macdonald, C., Santos, R., Ounis, I.: The whens and hows of learning to rank for web search. Information Retrieval, 1-45 (2012)

[14]

McCreadie, R., Macdonald, C., Santos, R. L. T., Ounis, I.: University of Glasgow at TREC 2011: Experiments with Terrier in Crowdsourcing, Microblog, and Web Tracks. In: TREC (2011)

[15]

Bendersky, M., Croft, W. B., Diao, Y.: Quality-biased ranking of web documents. In: WSDM, pp. 95-104 (2011)

Digital Library

[16]

Friedman, J. H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189-1232 (1999)

[17]

Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933-969 (2003)

Digital Library

[18]

Wu, Q., Burges, C. J.C., Gao, K. S., Adapting, J.: boosting for information retrieval measures. Information Retrieval 13(3), 254-270 (2010)

Digital Library

[19]

Chapelle, O., Y.C.: Yahoo! learning to rank challenge overview. Machine Learning. Machine Learning Research - Proceedings Track 14, 1-24 (2011)

[20]

Donmez, P., Svore, K. M., Burges, C. J.C.: On the local optimality of Lambda Rank. In: SIGIR, pp. 460-467 (2009)

Digital Library

[21]

Metzler, D., Croft, W. B.: Latent concept expansion using markov random fields. In: Proceedings of the Annual ACM SIGIR Conference, pp. 311-318 (2007)

Digital Library

[22]

Aslam, J. A., Kanoulas, E., Pavlu, V., Savev, S., Yilmaz, E.: Document selection methodologies for efficient and effective learning-to-rank. In: SIGIR, pp. 468-475 (2009)

Digital Library

[23]

Donmez, P., Carbonell, J. G.: Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 78-89. Springer, Heidelberg (2009)

Digital Library

[24]

Yilmaz, E., Robertson, S.: On the choice of effectiveness measures for learning to rank. Information Retrieval 13, 271-290 (2010)

Digital Library

[25]

Boytsov, L., Belova, A.: Evaluating learning-to-rank methods in the web track adhoc task. In: TREC (2011)

[26]

Bendersky, M., Metzler, D., Croft, W. B.: Parameterized concept weighting in verbose queries. In: SIGIR, pp. 605-614 (2011)

Digital Library

Cited By

Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
Bruch SLucchese CNardini F(2023)Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022ACM SIGIR Forum10.1145/3582900.358291656:2(1-14)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582916
Bruch SMackenzie JMaistro MNardini FChen HDuh WHuang HKato MMothe JPoblete B(2023)ReNeuIR at SIGIR 2023: The Second Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591922(3456-3459)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591922
Show More Cited By

Recommendations

Learning to Rank for Information Retrieval

Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are ...
Learning to Rank for Information Retrieval
Learning to rank for biomedical information retrieval
BIBM '15: Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Research articles in biomedicine domain have increased exponentially, which makes it more and more difficult for biologists to manually capture all the information they need. Information retrieval technologies can help to obtain the users' needed ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ECIR'13: Proceedings of the 35th European conference on Advances in Information Retrieval

March 2013

890 pages

ISBN:9783642369728

Editors:
Pavel Serdyukov
Yandex, Leo Tolstoy, 16, Moscow, Russia
,
Pavel Braslavski
Kontur Labs and Ural Federal University, Fonvizina 3-27, Yekaterinburg, Russia
,
Sergei O. Kuznetsov
National Research University Higher School of Economics (HSE), Pokrovskii bd 11, Moscow, Russia
,
Jaap Kamps
University of Amsterdam, Turfdraagsterpad 9, Amsterdam, The Netherlands
,
Stefan Rüger
Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes, UK

Sponsors

MRU: Mail.Ru
Google Inc.
ABBYY: ABBYY
RFBR: Russian Foundation for Basic Research
Yahoo! Labs

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 24 March 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
Bruch SLucchese CNardini F(2023)Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022ACM SIGIR Forum10.1145/3582900.358291656:2(1-14)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582916
Bruch SMackenzie JMaistro MNardini FChen HDuh WHuang HKato MMothe JPoblete B(2023)ReNeuIR at SIGIR 2023: The Second Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591922(3456-3459)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591922
Zamani HBendersky MMetzler DZhuang HWang XCrestani FPasi GGaussier E(2022)Stochastic Retrieval-Conditioned RerankingProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3539813.3545141(81-91)Online publication date: 23-Aug-2022
https://dl.acm.org/doi/10.1145/3539813.3545141
Xiao YFan YZhang RGuo J(2022)Beyond Precision: A Study on Recall of Initial Retrieval with Neural RepresentationsInformation Retrieval10.1007/978-3-031-24755-2_7(76-89)Online publication date: 16-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-24755-2_7
Cui HLu JGe YYang C(2022)How Can Graph Neural Networks Help Document Retrieval: A Case Study on CORD19 with Concept Map GenerationAdvances in Information Retrieval10.1007/978-3-030-99739-7_9(75-83)Online publication date: 10-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-030-99739-7_9
Chen SWang XQin ZMetzler DCaverlee JHu XLalmas MWang W(2020)Parameter Tuning in Personal Search SystemsProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371820(97-105)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371820
Zamani HSchedl MLamere PChen C(2019)An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist ContinuationACM Transactions on Intelligent Systems and Technology10.1145/334425710:5(1-21)Online publication date: 18-Sep-2019
https://dl.acm.org/doi/10.1145/3344257
Lucchese CNardini FPerego ROrlando STrani SCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Selective Gradient Boosting for Effective Learning to RankThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210048(155-164)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210048
Lucchese CNardini FOrlando SPerego RSilvestri FTrani S(2018)X-CLEaVERACM Transactions on Intelligent Systems and Technology10.1145/32054539:6(1-26)Online publication date: 29-Oct-2018
https://dl.acm.org/doi/10.1145/3205453
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents