[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2600428.2609572acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Predictive parallelization: taming tail latencies in web search

Published: 03 July 2014 Publication History

Abstract

Web search engines are optimized to reduce the high-percentile response time to consistently provide fast responses to almost all user queries. This is a challenging task because the query workload exhibits large variability, consisting of many short-running queries and a few long-running queries that significantly impact the high-percentile response time. With modern multicore servers, parallelizing the processing of an individual query is a promising solution to reduce query execution time, but it gives limited benefits compared to sequential execution since most queries see little or no speedup when parallelized. The root of this problem is that short-running queries, which dominate the workload, do not benefit from parallelization. They incur a large parallelization overhead, taking scarce resources from long-running queries. On the other hand, parallelization substantially reduces the execution time of long-running queries with low overhead and high parallelization efficiency. Motivated by these observations, we propose a predictive parallelization framework with two parts: (1) predicting long-running queries, and (2) selectively parallelizing them. For the first part, prediction should be accurate and efficient. For accuracy, we study a comprehensive feature set covering both term features (reflecting dynamic pruning efficiency) and query features (reflecting query complexity). For efficiency, to keep overhead low, we avoid expensive features that have excessive requirements such as large memory footprints. For the second part, we use the predicted query execution time to parallelize long-running queries and process short-running queries sequentially. We implement and evaluate the predictive parallelization framework in Microsoft Bing search. Our measurements show that under moderate to heavy load, the predictive strategy reduces the 99th-percentile response time by 50% (from 200 ms to 100 ms) compared with prior approaches that parallelize all queries.

References

[1]
G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Quer y difficulty, robustness and selective application of query expansion. In ECIR, 2004.
[2]
R. Baeza-Yates, A. Gionis, F. P. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. Design trade-offs for search engine caching. ACM Trans. Web, 2(4):20:1--20:28, Oct. 2008.
[3]
L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23(2):22--28, Mar. 2003.
[4]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In WWW, 1998.
[5]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, 2003.
[6]
N. Craswell, B. Billerbeck, D. Fetterly, and M. Najork. Robust query rewriting using anchor data. In WSDM, 2013.
[7]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, 2002.
[8]
J. Dean. Challenges in building large-scale information retrieval systems: invited talk. In WSDM, 2009.
[9]
J. Dean and L. A. Barroso. The tail at scale. CACM, 56(2):74--80, Feb. 2013.
[10]
E. Frachtenberg. Reducing query latencies in web search using fine-grained parallelism. World Wide Web, 12(4):441--460, Dec. 2009.
[11]
A. Freire, C. Macdonald, N. Tonellotto, I. Ounis, and F. Cacheda. Hybrid query scheduling for a replicated search engine. In ECIR, 2013.
[12]
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29:1189--1232, 2000.
[13]
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, 2004.
[14]
M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In EuroSys, 2013.
[15]
C. Macdonald, I. Ounis, and N. Tonellotto. Upper-bound approximations for dynamic pruning. ACM Trans. Inf. Syst., 29(4):17:1--17:28, Dec. 2011.
[16]
C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In SIGIR, 2012.
[17]
A. Moffat, W. Webber, J. Zobel, and R. Baeza-Yates. A pipelined architecture for distributed text query evaluation. Inf. Retr., 10(3):205--231, June 2007.
[18]
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci., 47(10):749--764, Sept. 1996.
[19]
J. Pitkow, H. Schütze, T. Cass, R. Cooley, D. Turnbull, A. Edmonds, E. Adar, and T. Breuel. Personalized search. CACM, 45(9):50--55, Sept. 2002.
[20]
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, 2005.
[21]
K. M. Risvik, T. Chilimbi, H. Tan, K. Kalyanaraman, and C. Anderson. Maguro, a system for indexing and searching over very large text collections. In WSDM, 2013.
[22]
E. Schurman and J. Brutlag. Performance related changes and their user impact. In Velocity, 2009.
[23]
K. Sparck Jones. Document retrieval systems. chapter A Statistical Interpretation of Term Specificity and Its Application in Retrieval, pages 132--142. 1988.
[24]
S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In SIGIR, 2011.
[25]
N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In WSDM, 2013.
[26]
F. Zhang, S. Shi, H. Yan, and J.-R. Wen. Revisiting globally sorted indexes for efficient document retrieval. In WSDM, 2010.
[27]
J. Zobel and A. Moffat. Inverted files for text search engines. In ACM Computing Survey, 2006.

Cited By

View all
  • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
  • (2023)Khronos: A Real-Time Indexing Framework for Time Series Databases on Large-Scale Performance Monitoring SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614944(1607-1616)Online publication date: 21-Oct-2023
  • (2023)An Efficient Scheduler for Task-Parallel Interactive ApplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591092(27-38)Online publication date: 17-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
July 2014
1330 pages
ISBN:9781450322577
DOI:10.1145/2600428
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. parallelism
  2. query execution time prediction
  3. tail latency

Qualifiers

  • Research-article

Funding Sources

Conference

SIGIR '14
Sponsor:

Acceptance Rates

SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)3
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Diversity-aware strategies for static index pruningInformation Processing & Management10.1016/j.ipm.2024.10379561:5(103795)Online publication date: Sep-2024
  • (2023)Khronos: A Real-Time Indexing Framework for Time Series Databases on Large-Scale Performance Monitoring SystemsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614944(1607-1616)Online publication date: 21-Oct-2023
  • (2023)An Efficient Scheduler for Task-Parallel Interactive ApplicationsProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591092(27-38)Online publication date: 17-Jun-2023
  • (2023)DDPC: Automated Data-Driven Power-Performance Controller Design on-the-fly for Latency-sensitive Web ServicesProceedings of the ACM Web Conference 202310.1145/3543507.3583437(3067-3076)Online publication date: 30-Apr-2023
  • (2023)Faster Dynamic Pruning via Reordering of Documents in Inverted IndexesProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3591987(2001-2005)Online publication date: 19-Jul-2023
  • (2023)TailGuard: Tail Latency SLO Guaranteed Task Scheduling for Data-Intensive User-Facing Applications2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS57875.2023.00042(898-909)Online publication date: Jul-2023
  • (2022)An NVM SSD-based High Performance Query Processing Framework for Search EnginesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3160557(1-1)Online publication date: 2022
  • (2022)Jarvis: Large-scale Server Monitoring with Adaptive Near-data Processing2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00110(1408-1422)Online publication date: May-2022
  • (2022)ReTail: Opting for Learning Simplicity to Enable QoS-Aware Power Management in the Cloud2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00020(155-168)Online publication date: Apr-2022
  • (2022)Cottage: Coordinated Time Budget Assignment for Latency, Quality and Power Optimization in Web Search2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00017(113-125)Online publication date: Apr-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media