[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3077136.3082060acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A/B Testing at Scale: Accelerating Software Innovation

Published: 07 August 2017 Publication History

Abstract

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100's of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps. Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.

References

[1]
R. Kohavi, "Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years," in Conference on Knowledge Discovery and Data Mining (KDD), 2009.
[2]
A. Fabijan, P. Dmitriev, H. Holmstrom and J. Bosch, "The Evolution of Continuous Experimentation in Software Product Development," in International Conference on Software Engineering (ICSE), 2017.
[3]
A. Deng and X. Shi, "Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned," in Conference on Knowledge Discovery and Data Mining (KDD), 2016.
[4]
A. Deng, J. Lu and S. Chen, "Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing," in Conference on Data Science and Advanced Analytics, 2016.
[5]
P. Dmitriev and X. Wu, "Measuring Metrics," in Conference on Information and Knowledge Management (CIKM), 2016.
[6]
W. Machmouchi and G. Buscher, "Principles for the Design of Online A/B Metrics," in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016.
[7]
Z. Zhao, M. Chen, D. Matheson and M. Stone, "Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation," in Conference on Data Science and Advanced Analytics, 2016.
[8]
R. Kohavi, R. Longbotham and J. Quarto-vonTivadar, "Planning, Running, and Analyzing Controlled Experiments on the Web," in tutorial at Conference on Knowledge Discovery and Data Mining, 2009.
[9]
R. Kohavi, "Pitfalls in Online Controlled Experiments," in MIT COnference on Digital Experimentation (CODE), 2016.
[10]
R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker and Y. Xu, "Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained," in Conference on Knowledge Discovery and Data Mining (KDD), 2012.
[11]
A. Deng, Y. Xu, R. Kohavi and T. Walker, "Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data," in Conference on Web Search and Data Mining (WSDM), 2013.
[12]
R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu and N. Pohlmann, "Online Controlled Experiments at Large Scale," in Conference on Knowledge Discovery and Data Mining (KDD), 2013.
[13]
A. Deng, "Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments," in World Wide Web Conference (WWW), 2015.
[14]
A. Deng, P. Zhang, S. Chen, D. Kim and J. Lu, "Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression," in In submission, 2017.

Cited By

View all
  • (2024)Country-diverted experiments for mitigation of network effectsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688046(765-767)Online publication date: 8-Oct-2024
  • (2024)Still doing it yourself? Investigating determinants for the adoption of intelligent process automationElectronic Markets10.1007/s12525-024-00737-934:1Online publication date: 14-Nov-2024
  • (2023)Measuring Service-Level Learning Effects in Search Via Query-Randomized ExperimentsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592020(2169-2173)Online publication date: 19-Jul-2023
  • Show More Cited By

Index Terms

  1. A/B Testing at Scale: Accelerating Software Innovation

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval
    August 2017
    1476 pages
    ISBN:9781450350228
    DOI:10.1145/3077136
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. a/b testing
    2. experimentation

    Qualifiers

    • Research-article

    Conference

    SIGIR '17
    Sponsor:

    Acceptance Rates

    SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 03 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Country-diverted experiments for mitigation of network effectsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688046(765-767)Online publication date: 8-Oct-2024
    • (2024)Still doing it yourself? Investigating determinants for the adoption of intelligent process automationElectronic Markets10.1007/s12525-024-00737-934:1Online publication date: 14-Nov-2024
    • (2023)Measuring Service-Level Learning Effects in Search Via Query-Randomized ExperimentsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592020(2169-2173)Online publication date: 19-Jul-2023
    • (2022)User Recommendation in Social Metaverse with VRProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557487(148-158)Online publication date: 17-Oct-2022
    • (2020)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsCompanion Proceedings of the Web Conference 202010.1145/3366424.3383117(317-319)Online publication date: 20-Apr-2020
    • (2020)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371871(877-880)Online publication date: 20-Jan-2020
    • (2019)A/B Testing at Scale: Accelerating Software InnovationCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3320093(1299-1300)Online publication date: 13-May-2019
    • (2019)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332297(3189-3190)Online publication date: 25-Jul-2019
    • (2019)Who's the Guinea Pig?Proceedings of the Conference on Fairness, Accountability, and Transparency10.1145/3287560.3287565(201-210)Online publication date: 29-Jan-2019

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media