More Web Proxy on the site http://driver.im/

research-article

Query Driven Algorithm Selection in Early Stage Retrieval

Authors:

Joel Mackenzie,

J. Shane Culpepper,

Charles L. A. Clarke,

Jimmy LinAuthors Info & Claims

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

Pages 396 - 404

https://doi.org/10.1145/3159652.3159676

Published: 02 February 2018 Publication History

Abstract

Large scale retrieval systems often employ cascaded ranking architectures, in which an initial set of candidate documents are iteratively refined and re-ranked by increasingly sophisticated and expensive ranking models. In this paper, we propose a unified framework for predicting a range of performance-sensitive parameters based on minimizing end-to-end effectiveness loss. The framework does not require relevance judgments for training, is amenable to predicting a wide range of parameters, allows for fine tuned efficiency-effectiveness trade-offs, and can be easily deployed in large scale search systems with minimal overhead. As a proof of concept, we show that the framework can accurately predict a number of performance parameters on a query-by-query basis, allowing efficient and effective retrieval, while simultaneously minimizing the tail latency of an early-stage candidate generation system. On the 50 million document ClueWeb09B collection, and across 25,000 queries, our hybrid system can achieve superior early-stage efficiency to fixed parameter systems without loss of effectiveness, and allows more finely-grained efficiency-effectiveness trade-offs across the multiple stages of the retrieval system.

References

[1]

G. Amati and C. J. Van Rijsbergen. 2002. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. 20, 4 (2002), 357--389.

Digital Library

[2]

V. N. Anh, O. de Kretser, and A. Moffat. 2001. Vector-Space Ranking with Effective Early Termination. In Proc. SIGIR. 35--42.

Digital Library

[3]

N. Asadi and J. Lin. 2013. Document Vector Representations for Feature Extraction in Multi-Stage Document Ranking. Inf. Retr. 16, 6 (2013), 747--768.

Digital Library

[4]

N. Asadi and J. Lin. 2013. Effectiveness/Efficiency Tradeoffs for Candidate Generation in Multi-Stage Retrieval Architectures. In Proc. SIGIR. 997--1000.

Digital Library

[5]

N. Asadi, J. Lin, and A. P. De Vries. 2014. Runtime Optimizations for Tree-Based Machine Learning Models. Trans. on Know. and Data Eng. 26, 9 (2014), 2281--2292.

[6]

P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2016. UQV100: A Test Collection with Query Variability. In Proc. SIGIR. 725--728.

Digital Library

[7]

D. Broccolo, C. Macdonald, O. Salvatore, I. Ounis, R. Perego, F. Silvestri, and N. Tonellotto. 2013. Load-sensitive Selective Pruning for Distributed Search. In Proc. CIKM. 379--388.

Digital Library

[8]

A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. 2003. Efficient Query Evaluation using a Two-Level Retrieval Process. In Proc. CIKM. 426--434.

Digital Library

[9]

B. B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. 2010. Early Exit Optimizations for Additive Machine Learned Ranking Systems. In Proc. WSDM. 411--420.

Digital Library

[10]

K. Chakrabarti, S. Chaudhuri, and V. Ganti. 2011. Interval-based Pruning for Top-k Processing over Compressed Lists. In Proc. ICDE. 709--720.

Digital Library

[11]

O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. CIKM. 621--630.

Digital Library

[12]

R-C. Cheng, L. Gallagher, R. Blanco, and J. S. Culpepper. 2017. Efficient CostAware Cascade Ranking in Multi-Stage Retrieval. In Proc. SIGIR. 445--454.

Digital Library

[13]

C. L. A. Clarke, J. S. Culpepper, and A. Moffat. 2016. Assessing Efficiency-- Effectiveness Tradeoffs in Multi-Stage Retrieval Systems without using Relevance Judgments. Inf. Retr. 19, 4 (2016), 351--377.

Digital Library

[14]

M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman. 2017. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation. In Proc. WSDM. 201--210.

Digital Library

[15]

M. Crane, A. Trotman, and R. O'Keefe. 2013. Maintaining Discriminatory Power in Quantized Indexes. In Proc. CIKM. 1221--1224.

Digital Library

[16]

J. S. Culpepper, C. L. A. Clarke, and J. Lin. 2016. Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems. In Proc. ADCS. 17--24.

Digital Library

[17]

V. Dang, M. Bendersky, and W. B. Croft. 2013. Two-Stage Learning to Rank for Information Retrieval. In Proc. ECIR. 423--434.

Digital Library

[18]

J. Dean and L. A. Barroso. 2013. The Tail at Scale. Comm. ACM 56, 2 (2013), 74--80.

Digital Library

[19]

C. Dimopoulos, S. Nepomnyachiy, and T. Suel. 2013. Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes. In Proc. WSDM. 113--122.

Digital Library

[20]

S. Ding and T. Suel. 2011. Faster Top-k Document Retrieval Using Block-Max Indexes. In Proc. SIGIR. 993--1002.

Digital Library

[21]

M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. 2011. Evaluation Strategies for Top-k Queries over Memory-resident Inverted Indexes. Proc. VLDB 4, 12 (2011), 1213--1224.

Digital Library

[22]

G. Francès, X. Bai, B. B. Cambazoglu, and R. Baeza-Yates. 2014. Improving the Efficiency of Multi-site Web Search Engines. In Proc. WSDM. 3--12.

Digital Library

[23]

Y. Ganjisaffar, R. Caruana, and C. Lopes. 2011. Bagging Gradient-Boosted Trees for High Precision, Low Variance Ranking Models. In Proc. SIGIR. 85--94.

Digital Library

[24]

S-W. Hwang, K. Saehoon, Y. He, S. Elnikety, and S. Choi. 2016. Prediction and Predictability for Search Query Acceleration. ACM Trans. Web 10, 3 (Aug. 2016), 19.1--19.28.

Digital Library

[25]

M. Jeon, S. Kim, S-W. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. 2014. Predictive Parallelization: Taming Tail Latencies in Web Search. In Proc. SIGIR. 253--262.

Digital Library

[26]

X. Jin, T. Yang, and X. Tang. 2016. A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation. In Proc. SIGIR. 629--638.

Digital Library

[27]

S. Kim, Y. He, S-W. Hwang, S. Elnikety, and S. Choi. 2015. Delayed-DynamicSelective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search. In Proc. WSDM. 7--16.

Digital Library

[28]

Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. 2016. Efficient Distributed Selective Search. Inf. Retr. (2016), 1--32.

Digital Library

[29]

Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. 2016. Load-Balancing in Distributed Selective Search. In Proc. SIGIR. 905--908.

Digital Library

[30]

D. Lemire and L. Boytsov. 2015. Decoding Billions of Integers Per Second through Vectorization. Soft. Prac. & Exp. 45, 1 (2015), 1--29.

Digital Library

[31]

J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. 2016. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. In Proc. ECIR. 408--420.

[32]

J. Lin and A. Trotman. 2015. Anytime Ranking for Impact-Ordered Indexes. In Proc. ICTIR. 301--304.

Digital Library

[33]

X. Lu, A. Moffat, and J. S. Culpepper. 2016. The Effect of Pooling and Evaluation Depth on IR Metrics. Inf. Retr. 19, 4 (2016), 416--445.

Digital Library

[34]

C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2015. QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees. In Proc. SIGIR. 73--82.

Digital Library

[35]

C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2016. Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles. In Proc. SIGIR. 833--836.

Digital Library

[36]

C. Macdonald, R. L. T. Santos, and I. Ounis. 2013. The Whens and Hows of Learning to Rank for Web Search. Inf. Retr. 16, 5 (2013), 584--628.

Digital Library

[37]

C. Macdonald, R. L. T. Santos, I. Ounis, and B. He. 2013. About Learning Models with Multiple Query-Dependent Features. ACM Trans. Information Systems 31, 3 (2013), 11.

Digital Library

[38]

J. Mackenzie, F. M. Choudhury, and J. S. Culpepper. 2015. Efficient Location-aware Web Search. In Proc. ADCS. 4.1--4.8.

Digital Library

[39]

A. Moffat and J. Zobel. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Information Systems 27, 1 (2008), 2.1--2.27.

Digital Library

[40]

J. Pedersen. 2010. Query Understanding at Bing. Invited talk, SIGIR (2010).

[41]

M. Petri, J. S. Culpepper, and A. Moffat. 2013. Exploring the Magic of WAND. In Proc. ADCS. 58--65.

Digital Library

[42]

M. Petri, A. Moffat, and J. S. Culpepper. 2014. Score-safe Term Dependency Processing with Hybrid Indexes. In Proc. SIGIR. 899--902.

Digital Library

[43]

C. Rossi, E. S. de Moura, A. L. Carvalho, and A. S. da Silva. 2013. Fast Documentat-a-time Query Processing Using Two-tier Indexes. In Proc. SIGIR. 183--192.

Digital Library

[44]

D. J. Schuirmann. 1987. A Comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J. Pharmacokinetics and Biopharmaceutics 15, 6 (1987), 657--680.

[45]

A. Shtok, O. Kurland, and D. Carmel. 2016. Query Performance Prediction Using Reference Lists. ACM Trans. Information Systems 34, 4 (2016), 19.1--19.34.

Digital Library

[46]

L. Tan and C. L. A. Clarke. 2015. A Family of Rank Similarity Measures Based on Maximized Effectiveness Difference. Trans. on Know. and Data Eng. 27, 11 (2015), 2865--2877.

Digital Library

[47]

N. Tonellotto, C. Macdonald, and I. Ounis. 2011. Effect of Different Docid Orderings on Dynamic Pruning Retrieval Strategies. In Proc. SIGIR. 1179--1180.

Digital Library

[48]

N. Tonellotto, C. Macdonald, and I. Ounis. 2013. Efficient and Effective Retrieval using Selective Pruning. In Proc. WSDM. 63--72.

Digital Library

[49]

A. Trotman. 2014. Compression, SIMD, and Postings Lists. In Proc. ADCS. 50.50-- 50.57.

Digital Library

[50]

A. Trotman, X-F. Jia, and M. Crane. 2012. Towards an Efficient and Effective Search Engine. In Wkshp. Open Source IR. 40--47.

[51]

A. Trotman and J. Lin. 2016. In Vacuo and In Situ Evaluation of SIMD Codecs. In Proc. ADCS. 1--8.

Digital Library

[52]

L. Wang, P. N. Bennett, and K. Collins-Thompson. 2012. Robust Ranking Models via Risk-sensitive Optimization. In Proc. SIGIR. 761--770.

Digital Library

[53]

L. Wang, J. Lin, and D. Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. In Proc. SIGIR. 105--114.

Digital Library

[54]

Q. Wang, C. Dimpoloulos, and T. Suel. 2016. Fast First-Phase Candidate Generation for Cascading Rankers. In Proc. SIGIR. 295--304.

Digital Library

[55]

W. Webber, A. Moffat, and J. Zobel. 2010. A Similarity Measure for Indefinite Rankings. ACM Trans. Information Systems 28, 4 (2010), 20.1--20.38.

Digital Library

[56]

Z. Xu, M. J. Kusner, K. Q. Weinberger, M. Chen, and O. Chapelle. 2014. Classifier Cascades and Trees for Minimizing Feature Evaluation Cost. J. of Machine Learning Research 15 (2014), 2113--2144.

Digital Library

[57]

J-M. Yun, Y. He, S. Elnikety, and S. Ren. 2015. Optimal Aggregation Policy for Reducing Tail Latency of Web Search. In Proc. SIGIR. 63--72.

Digital Library

[58]

J. Zhang, X. Long, and T. Suel. 2008. Performance of Compressed Inverted List Caching in Search Engines. In Proc. WWW. 387--396.

Digital Library

[59]

J. Zobel and A. Moffat. 2006. Inverted Files for Text Search Engines. ACM Comp. Surv. 38, 2 (2006), 6.1--6.56.

Digital Library

Cited By

Moffat AMackenzie JMallia APetri MSakai TIshita EOhshima HRadboud FMao JJose J(2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698405
Kumarasinghe ULekssays ASencar HBoughorbel SElvitigala CNakov PQuek TGao DZhou JCardenas A(2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3645000
Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
Show More Cited By

Index Terms

Query Driven Algorithm Selection in Early Stage Retrieval
1. Information systems
  1. Information retrieval

Recommendations

Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems
ADCS '16: Proceedings of the 21st Australasian Document Computing Symposium

Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of the initial candidate pool, ...
University of Alicante at WiQA 2006
Evaluation of Multilingual and Multi-modal Information Retrieval

This paper presents the participation of University of Alicante at the WiQA pilot task organized as part of the CLEF 2006 campaign. For a given set of topics, this task presupposes the discovery of important novel information distributed across ...
Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

High findability of documents within a certain cut-off rank is considered an important factor in recall-oriented application domains such as patent or legal document retrieval. Findability is hindered by two aspects, namely the inherent bias favoring ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

February 2018

821 pages

ISBN:9781450355810

DOI:10.1145/3159652

General Chairs:
Yi Chang
Jilin University, Huawei Inc.
,
Chengxiang Zhai
University of Illinois Urbana-Champaign
,
Program Chairs:
Yan Liu
University of Southern California
,
Yoelle Maarek
Amazon

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

WSDM 2018

Sponsor:

WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining

February 5 - 9, 2018

CA, Marina Del Rey, USA

Acceptance Rates

WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
311
Total Downloads

Downloads (Last 12 months)22
Downloads (Last 6 weeks)3

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Moffat AMackenzie JMallia APetri MSakai TIshita EOhshima HRadboud FMao JJose J(2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698405
Kumarasinghe ULekssays ASencar HBoughorbel SElvitigala CNakov PQuek TGao DZhou JCardenas A(2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3645000
Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
McKechnie JMcDonald GMacdonald CHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Bi-Objective Negative Sampling for Sensitivity-Aware SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657895(2296-2300)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657895
Roitman HChua TNgo CKumar RLauw HKa-Wei Lee R(2024)Unsupervised Search Algorithm Configuration using Query Performance PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651579(658-661)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3651579
Yigit-Sert SAltingovde IUlusoy Ö(2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
https://doi.org/10.1016/j.ipm.2024.103795
Bruch SLucchese CNardini F(2023)Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022ACM SIGIR Forum10.1145/3582900.358291656:2(1-14)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582916
Mackenzie JTrotman ALin J(2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
https://dl.acm.org/doi/10.1145/3576922
Qiao YYang YLin HYang T(2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583497
Yafay EAltingovde IChen HDuh WHuang HKato MMothe JPoblete B(2023)Faster Dynamic Pruning via Reordering of Documents in Inverted IndexesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591987(2001-2005)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3591987
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents