[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2766462.2767708acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Optimal Aggregation Policy for Reducing Tail Latency of Web Search

Published: 09 August 2015 Publication History

Abstract

A web search engine often employs partition-aggregate architecture, where an aggregator propagates a user query to all index serving nodes (ISNs) and collects the responses from them. An aggregation policy determines how long the aggregators wait for the ISNs before returning aggregated results to users, crucially affecting both query latency and quality. Designing an aggregation policy is, however, challenging: Response latency among queries and among ISNs varies significantly, and aggregators lack of knowledge about when ISNs will respond. In this paper, we propose aggregation policies that minimize tail latency of search queries subject to search quality service level agreements (SLAs), combining data-driven offline analysis with online processing. Beginning with a single aggregator, we formally prove the optimality of our policy: It achieves the offline optimal result without knowing future responses of ISNs. We extend our policy for commonly-used hierarchical levels of aggregators and prove its optimality when messaging times between aggregators are known. We also present an empirically-effective policy to address unknown messaging time. We use production traces from a commercial search engine, a commercial advertisement engine, and synthetic workloads to evaluate the aggregation policy. The results show that compared to prior work, the policy reduces tail latency by up to 40% while satisfying same quality SLAs.

References

[1]
M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and M. Sridharan. Data center tcp (dctcp). SIGCOMM Comput. Commun. Rev., 41(4):--, Aug. 2010.
[2]
I. S. Altingovde, R. Blanco, B. B. Cambazoglu, R. Ozcan, E. Sarigil, and O. Ulusoy. Characterizing web search queries that match very few or no results. In CIKM, 2012.
[3]
V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR, 2001.
[4]
I. Arapakis, X. Bai, and B. B. Cambazoglu. Impact of response latency on user behavior in web search. In SIGIR, 2014.
[5]
C. S. Badue, R. Baeza-Yates, B. Ribeiro-Neto, A. Ziviani, and N. Ziviani. Analyzing imbalance among homogeneous index servers in a web search system. Inf. Process. Manage., 43(3), 2007.
[6]
R. Baeza-Yates, A. Gionis, F. Junqueira, V. Murdock, V. Plachouras, and F. Silvestri. The impact of caching on search engines. In SIGIR, 2007.
[7]
L. A. Barroso, J. Dean, and U. Hölzle. Web search for a planet: The google cluster architecture. IEEE Micro, 23(2):22--28, 2003.
[8]
A. Broder and M. Mitzenmacher. Optimal plans for aggregation. In PODC, 2002.
[9]
B. B. Cambazoglu, I. S. Altingovde, R. Ozcan, and Ö. Ulusoy. Cache-based query processing for search engines. ACM Transactions on the Web, 6(4):14, 2012.
[10]
B. B. Cambazoglu, F. P. Junqueira, V. Plachouras, S. Banachowski, B. Cui, S. Lim, and B. Bridge. A refreshing perspective of search engine caching. In WWW, 2010.
[11]
B. B. Cambazoglu, V. Plachouras, and R. Baeza-Yates. Quantifying performance and quality gains in distributed web search engines. In SIGIR, 2009.
[12]
J. Dean. Challenges in building large-scale information retrieval systems (Invited talk). In WSDM, 2009.
[13]
J. Dean and L. A. Barroso. The tail at scale. Commun. ACM, 56(2):74--80, Feb. 2013.
[14]
S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In WWW, 2009.
[15]
E. Frachtenberg. Reducing query latencies in web search using fine-grained parallelism. In WWW, 2009.
[16]
Q. Gan and T. Suel. Improved techniques for result caching in web search engines. In WWW, 2009.
[17]
J. Hamilton. The cost of latency. http://perspectives.mvdirona.com/2009/10/31/TheCostOfLatency.aspx, 2009.
[18]
E. Hatcher and O. Gospodnetic. Lucene in Action. Manning Publications Co., 2004.
[19]
Y. He, S. Elnikety, J. Larus, and C. Yan. Zeta: Scheduling interactive services with partial execution. In SOCC, 2012.
[20]
V. Jalaparti, P. Bodik, S. Kandula, I. Menache, M. Rybalkin, and C. Yan. Speeding up distributed request-response workflows. In SIGCOMM '13, 2013.
[21]
M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In EuroSys, 2013.
[22]
M. Jeon, S. Kim, S.-W. Hwang, Y. He, A. L. Cox, and S. Rixner. Taming tail latencies in web search. In SIGIR, 2014.
[23]
S. Jonassen, B. B. Cambazoglu, and F. Silvestri. Prefetching query results and its impact on search engines. In SIGIR, 2012.
[24]
J. R. Lorch and A. J. Smith. Improving dynamic voltage scaling algorithms with PACE. In SIGMETRICS, 2001.
[25]
C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In SIGIR, 2012.
[26]
R. Ozcan, I. S. Altingovde, and Ö. Ulusoy. Cost-aware strategies for query result caching in web search engines. ACM Transactions on the Web, 5(2):9, 2011.
[27]
E. Schurman and J. Brutlag. The user and business impact of server delays, additional bytes, and http chunking in web search. In Velocity, 2009.
[28]
G. Upadhyaya, V. S. Pai, and S. P. Midkiff. Expressing and exploiting concurrency in networked applications with aspen. In PPoPP, 2007.
[29]
A. Vulimiri, O. Michel, P. B. Godfrey, and S. Shenker. More is less: Reducing latency via redundancy. In HotNets, 2012.
[30]
Z. Ye, A. A. Abouzeid, and J. Ai. Optimal stochastic policies for distributed data aggregation in wireless sensor networks. IEEE/ACM Trans. Netw., 17(5):1494--1507, Oct. 2009.
[31]
J. Yi, F. Maghoul, and J. Pedersen. Deciphering mobile search patterns: A study of Yahoo! mobile search queries. In WWW, 2008.
[32]
J.-M. Yun, Y. He, S. Elnikety, and S. Ren. Optimal aggregation policy for reducing tail latency of web search. Technical Report MSR-TR-2015--39, Microsoft Research, 2015. http://research.microsoft.com/apps/pubs/?id=246273.
[33]
F. Zhang, S. Shi, H. Yan, and J.-R. Wen. Revisiting globally sorted indexes for efficient document retrieval. In WSDM, 2010.
[34]
J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In WWW, 2008.
[35]
S. Zhu, A. Potapova, M. Alabduljalil, X. Liu, and T. Yang. Clustering and load balancing optimization for redundant content removal. In WWW, 2012.

Cited By

View all
  • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
  • (2023)Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobsJournal of Cloud Computing10.1186/s13677-023-00465-z12:1Online publication date: 17-Jun-2023
  • (2022)TailCmp - A Tail Latency Evaluation Solution of Public Cloud and Labeled von Neumann Architecture based Cloud Prototype2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00118(888-895)Online publication date: Dec-2022
  • Show More Cited By

Index Terms

  1. Optimal Aggregation Policy for Reducing Tail Latency of Web Search

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
    August 2015
    1198 pages
    ISBN:9781450336215
    DOI:10.1145/2766462
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. aggregation
    2. scheduling
    3. tail latency
    4. web search

    Qualifiers

    • Research-article

    Funding Sources

    • U.S. NSF

    Conference

    SIGIR '15
    Sponsor:

    Acceptance Rates

    SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Scalable Distributed Inverted List Indexes in Disaggregated MemoryProceedings of the ACM on Management of Data10.1145/36549742:3(1-27)Online publication date: 30-May-2024
    • (2023)Fast DRL-based scheduler configuration tuning for reducing tail latency in edge-cloud jobsJournal of Cloud Computing10.1186/s13677-023-00465-z12:1Online publication date: 17-Jun-2023
    • (2022)TailCmp - A Tail Latency Evaluation Solution of Public Cloud and Labeled von Neumann Architecture based Cloud Prototype2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00118(888-895)Online publication date: Dec-2022
    • (2022)Qos-Sure: Qos-Assurance AutoScaling of Sharing GPU For DNN Inference in Multi-Task2022 10th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech58360.2022.00064(367-372)Online publication date: Dec-2022
    • (2022)Cottage: Coordinated Time Budget Assignment for Latency, Quality and Power Optimization in Web Search2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00017(113-125)Online publication date: Apr-2022
    • (2022)EAISFuture Generation Computer Systems10.1016/j.future.2022.01.004130:C(253-268)Online publication date: 1-May-2022
    • (2021)Tinker: A Middleware for Deploying Multiple NN-Based Applications on a Single MachineIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.301998140:7(1495-1499)Online publication date: Jul-2021
    • (2020)Minimizing Cost and Maximizing Performance for Cloud PlatformsProceedings of the 21st International Middleware Conference Doctoral Symposium10.1145/3429351.3431747(29-34)Online publication date: 7-Dec-2020
    • (2020)Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00059(637-349)Online publication date: Oct-2020
    • (2019)Efficient main-memory top-K selection for multicore architecturesProceedings of the VLDB Endowment10.14778/3364324.336432713:2(114-127)Online publication date: 1-Oct-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media