[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3159652.3159676acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Query Driven Algorithm Selection in Early Stage Retrieval

Published: 02 February 2018 Publication History

Abstract

Large scale retrieval systems often employ cascaded ranking architectures, in which an initial set of candidate documents are iteratively refined and re-ranked by increasingly sophisticated and expensive ranking models. In this paper, we propose a unified framework for predicting a range of performance-sensitive parameters based on minimizing end-to-end effectiveness loss. The framework does not require relevance judgments for training, is amenable to predicting a wide range of parameters, allows for fine tuned efficiency-effectiveness trade-offs, and can be easily deployed in large scale search systems with minimal overhead. As a proof of concept, we show that the framework can accurately predict a number of performance parameters on a query-by-query basis, allowing efficient and effective retrieval, while simultaneously minimizing the tail latency of an early-stage candidate generation system. On the 50 million document ClueWeb09B collection, and across 25,000 queries, our hybrid system can achieve superior early-stage efficiency to fixed parameter systems without loss of effectiveness, and allows more finely-grained efficiency-effectiveness trade-offs across the multiple stages of the retrieval system.

References

[1]
G. Amati and C. J. Van Rijsbergen. 2002. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. 20, 4 (2002), 357--389.
[2]
V. N. Anh, O. de Kretser, and A. Moffat. 2001. Vector-Space Ranking with Effective Early Termination. In Proc. SIGIR. 35--42.
[3]
N. Asadi and J. Lin. 2013. Document Vector Representations for Feature Extraction in Multi-Stage Document Ranking. Inf. Retr. 16, 6 (2013), 747--768.
[4]
N. Asadi and J. Lin. 2013. Effectiveness/Efficiency Tradeoffs for Candidate Generation in Multi-Stage Retrieval Architectures. In Proc. SIGIR. 997--1000.
[5]
N. Asadi, J. Lin, and A. P. De Vries. 2014. Runtime Optimizations for Tree-Based Machine Learning Models. Trans. on Know. and Data Eng. 26, 9 (2014), 2281--2292.
[6]
P. Bailey, A. Moffat, F. Scholer, and P. Thomas. 2016. UQV100: A Test Collection with Query Variability. In Proc. SIGIR. 725--728.
[7]
D. Broccolo, C. Macdonald, O. Salvatore, I. Ounis, R. Perego, F. Silvestri, and N. Tonellotto. 2013. Load-sensitive Selective Pruning for Distributed Search. In Proc. CIKM. 379--388.
[8]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. 2003. Efficient Query Evaluation using a Two-Level Retrieval Process. In Proc. CIKM. 426--434.
[9]
B. B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. 2010. Early Exit Optimizations for Additive Machine Learned Ranking Systems. In Proc. WSDM. 411--420.
[10]
K. Chakrabarti, S. Chaudhuri, and V. Ganti. 2011. Interval-based Pruning for Top-k Processing over Compressed Lists. In Proc. ICDE. 709--720.
[11]
O. Chapelle, D. Metzler, Y. Zhang, and P. Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In Proc. CIKM. 621--630.
[12]
R-C. Cheng, L. Gallagher, R. Blanco, and J. S. Culpepper. 2017. Efficient CostAware Cascade Ranking in Multi-Stage Retrieval. In Proc. SIGIR. 445--454.
[13]
C. L. A. Clarke, J. S. Culpepper, and A. Moffat. 2016. Assessing Efficiency-- Effectiveness Tradeoffs in Multi-Stage Retrieval Systems without using Relevance Judgments. Inf. Retr. 19, 4 (2016), 351--377.
[14]
M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman. 2017. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation. In Proc. WSDM. 201--210.
[15]
M. Crane, A. Trotman, and R. O'Keefe. 2013. Maintaining Discriminatory Power in Quantized Indexes. In Proc. CIKM. 1221--1224.
[16]
J. S. Culpepper, C. L. A. Clarke, and J. Lin. 2016. Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems. In Proc. ADCS. 17--24.
[17]
V. Dang, M. Bendersky, and W. B. Croft. 2013. Two-Stage Learning to Rank for Information Retrieval. In Proc. ECIR. 423--434.
[18]
J. Dean and L. A. Barroso. 2013. The Tail at Scale. Comm. ACM 56, 2 (2013), 74--80.
[19]
C. Dimopoulos, S. Nepomnyachiy, and T. Suel. 2013. Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes. In Proc. WSDM. 113--122.
[20]
S. Ding and T. Suel. 2011. Faster Top-k Document Retrieval Using Block-Max Indexes. In Proc. SIGIR. 993--1002.
[21]
M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. 2011. Evaluation Strategies for Top-k Queries over Memory-resident Inverted Indexes. Proc. VLDB 4, 12 (2011), 1213--1224.
[22]
G. Francès, X. Bai, B. B. Cambazoglu, and R. Baeza-Yates. 2014. Improving the Efficiency of Multi-site Web Search Engines. In Proc. WSDM. 3--12.
[23]
Y. Ganjisaffar, R. Caruana, and C. Lopes. 2011. Bagging Gradient-Boosted Trees for High Precision, Low Variance Ranking Models. In Proc. SIGIR. 85--94.
[24]
S-W. Hwang, K. Saehoon, Y. He, S. Elnikety, and S. Choi. 2016. Prediction and Predictability for Search Query Acceleration. ACM Trans. Web 10, 3 (Aug. 2016), 19.1--19.28.
[25]
M. Jeon, S. Kim, S-W. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. 2014. Predictive Parallelization: Taming Tail Latencies in Web Search. In Proc. SIGIR. 253--262.
[26]
X. Jin, T. Yang, and X. Tang. 2016. A Comparison of Cache Blocking Methods for Fast Execution of Ensemble-based Score Computation. In Proc. SIGIR. 629--638.
[27]
S. Kim, Y. He, S-W. Hwang, S. Elnikety, and S. Choi. 2015. Delayed-DynamicSelective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search. In Proc. WSDM. 7--16.
[28]
Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. 2016. Efficient Distributed Selective Search. Inf. Retr. (2016), 1--32.
[29]
Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. 2016. Load-Balancing in Distributed Selective Search. In Proc. SIGIR. 905--908.
[30]
D. Lemire and L. Boytsov. 2015. Decoding Billions of Integers Per Second through Vectorization. Soft. Prac. & Exp. 45, 1 (2015), 1--29.
[31]
J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. 2016. Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge. In Proc. ECIR. 408--420.
[32]
J. Lin and A. Trotman. 2015. Anytime Ranking for Impact-Ordered Indexes. In Proc. ICTIR. 301--304.
[33]
X. Lu, A. Moffat, and J. S. Culpepper. 2016. The Effect of Pooling and Evaluation Depth on IR Metrics. Inf. Retr. 19, 4 (2016), 416--445.
[34]
C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2015. QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees. In Proc. SIGIR. 73--82.
[35]
C. Lucchese, F. M. Nardini, S. Orlando, R. Perego, N. Tonellotto, and R. Venturini. 2016. Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles. In Proc. SIGIR. 833--836.
[36]
C. Macdonald, R. L. T. Santos, and I. Ounis. 2013. The Whens and Hows of Learning to Rank for Web Search. Inf. Retr. 16, 5 (2013), 584--628.
[37]
C. Macdonald, R. L. T. Santos, I. Ounis, and B. He. 2013. About Learning Models with Multiple Query-Dependent Features. ACM Trans. Information Systems 31, 3 (2013), 11.
[38]
J. Mackenzie, F. M. Choudhury, and J. S. Culpepper. 2015. Efficient Location-aware Web Search. In Proc. ADCS. 4.1--4.8.
[39]
A. Moffat and J. Zobel. 2008. Rank-Biased Precision for Measurement of Retrieval Effectiveness. ACM Trans. Information Systems 27, 1 (2008), 2.1--2.27.
[40]
J. Pedersen. 2010. Query Understanding at Bing. Invited talk, SIGIR (2010).
[41]
M. Petri, J. S. Culpepper, and A. Moffat. 2013. Exploring the Magic of WAND. In Proc. ADCS. 58--65.
[42]
M. Petri, A. Moffat, and J. S. Culpepper. 2014. Score-safe Term Dependency Processing with Hybrid Indexes. In Proc. SIGIR. 899--902.
[43]
C. Rossi, E. S. de Moura, A. L. Carvalho, and A. S. da Silva. 2013. Fast Documentat-a-time Query Processing Using Two-tier Indexes. In Proc. SIGIR. 183--192.
[44]
D. J. Schuirmann. 1987. A Comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J. Pharmacokinetics and Biopharmaceutics 15, 6 (1987), 657--680.
[45]
A. Shtok, O. Kurland, and D. Carmel. 2016. Query Performance Prediction Using Reference Lists. ACM Trans. Information Systems 34, 4 (2016), 19.1--19.34.
[46]
L. Tan and C. L. A. Clarke. 2015. A Family of Rank Similarity Measures Based on Maximized Effectiveness Difference. Trans. on Know. and Data Eng. 27, 11 (2015), 2865--2877.
[47]
N. Tonellotto, C. Macdonald, and I. Ounis. 2011. Effect of Different Docid Orderings on Dynamic Pruning Retrieval Strategies. In Proc. SIGIR. 1179--1180.
[48]
N. Tonellotto, C. Macdonald, and I. Ounis. 2013. Efficient and Effective Retrieval using Selective Pruning. In Proc. WSDM. 63--72.
[49]
A. Trotman. 2014. Compression, SIMD, and Postings Lists. In Proc. ADCS. 50.50-- 50.57.
[50]
A. Trotman, X-F. Jia, and M. Crane. 2012. Towards an Efficient and Effective Search Engine. In Wkshp. Open Source IR. 40--47.
[51]
A. Trotman and J. Lin. 2016. In Vacuo and In Situ Evaluation of SIMD Codecs. In Proc. ADCS. 1--8.
[52]
L. Wang, P. N. Bennett, and K. Collins-Thompson. 2012. Robust Ranking Models via Risk-sensitive Optimization. In Proc. SIGIR. 761--770.
[53]
L. Wang, J. Lin, and D. Metzler. 2011. A Cascade Ranking Model for Efficient Ranked Retrieval. In Proc. SIGIR. 105--114.
[54]
Q. Wang, C. Dimpoloulos, and T. Suel. 2016. Fast First-Phase Candidate Generation for Cascading Rankers. In Proc. SIGIR. 295--304.
[55]
W. Webber, A. Moffat, and J. Zobel. 2010. A Similarity Measure for Indefinite Rankings. ACM Trans. Information Systems 28, 4 (2010), 20.1--20.38.
[56]
Z. Xu, M. J. Kusner, K. Q. Weinberger, M. Chen, and O. Chapelle. 2014. Classifier Cascades and Trees for Minimizing Feature Evaluation Cost. J. of Machine Learning Research 15 (2014), 2113--2144.
[57]
J-M. Yun, Y. He, S. Elnikety, and S. Ren. 2015. Optimal Aggregation Policy for Reducing Tail Latency of Web Search. In Proc. SIGIR. 63--72.
[58]
J. Zhang, X. Long, and T. Suel. 2008. Performance of Compressed Inverted List Caching in Search Engines. In Proc. WWW. 387--396.
[59]
J. Zobel and A. Moffat. 2006. Inverted Files for Text Search Engines. ACM Comp. Surv. 38, 2 (2006), 6.1--6.56.

Cited By

View all
  • (2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
  • (2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
  • (2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining
February 2018
821 pages
ISBN:9781450355810
DOI:10.1145/3159652
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. experimentation
  2. measurement
  3. multi-stage retrieval
  4. performance
  5. query prediction

Qualifiers

  • Research-article

Funding Sources

Conference

WSDM 2018

Acceptance Rates

WSDM '18 Paper Acceptance Rate 81 of 514 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Rank-Biased Quality Measurement for Sets and RankingsProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698405(135-144)Online publication date: 8-Dec-2024
  • (2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
  • (2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
  • (2024)Bi-Objective Negative Sampling for Sensitivity-Aware SearchProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657895(2296-2300)Online publication date: 10-Jul-2024
  • (2024)Unsupervised Search Algorithm Configuration using Query Performance PredictionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651579(658-661)Online publication date: 13-May-2024
  • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
  • (2023)Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022ACM SIGIR Forum10.1145/3582900.358291656:2(1-14)Online publication date: 31-Jan-2023
  • (2023)Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse RepresentationsACM Transactions on Information Systems10.1145/357692241:4(1-28)Online publication date: 22-Mar-2023
  • (2023)Optimizing Guided Traversal for Fast Learned Sparse RetrievalProceedings of the ACM Web Conference 202310.1145/3543507.3583497(3375-3385)Online publication date: 30-Apr-2023
  • (2023)Faster Dynamic Pruning via Reordering of Documents in Inverted IndexesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591987(2001-2005)Online publication date: 19-Jul-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media