research-article

A/B Testing at Scale: Accelerating Software Innovation

Authors:

Lukas VermeerAuthors Info & Claims

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 1395 - 1397

https://doi.org/10.1145/3077136.3082060

Published: 07 August 2017 Publication History

Get Access

Abstract

The Internet provides developers of connected software, including web sites, applications, and devices, an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments, also known as A/B tests. From front-end user-interface changes to backend algorithms, from search engines (e.g., Google, Bing, Yahoo!) to retailers (e.g., Amazon, eBay, Etsy) to social networking services (e.g., Facebook, LinkedIn, Twitter) to travel services (e.g., Expedia, Airbnb, Booking.com) to many startups, online controlled experiments are now utilized to make data-driven decisions at a wide range of companies. While the theory of a controlled experiment is simple, and dates back to Sir Ronald A. Fisher's experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, the deployment and evaluation of online controlled experiments at scale (100's of concurrently running experiments) across variety of web sites, mobile apps, and desktop applications presents many pitfalls and new research challenges. In this tutorial we will give an introduction to A/B testing, share key lessons learned from scaling experimentation at Bing to thousands of experiments per year, present real examples, and outline promising directions for future work. The tutorial will go beyond applications of A/B testing in information retrieval and will also discuss on practical and research challenges arising in experimentation on web sites and mobile and desktop apps. Our goal in this tutorial is to teach attendees how to scale experimentation for their teams, products, and companies, leading to better data-driven decisions. We also want to inspire more academic research in the relatively new and rapidly evolving field of online controlled experimentation.

References

[1]

R. Kohavi, "Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years," in Conference on Knowledge Discovery and Data Mining (KDD), 2009.

Google Scholar

[2]

A. Fabijan, P. Dmitriev, H. Holmstrom and J. Bosch, "The Evolution of Continuous Experimentation in Software Product Development," in International Conference on Software Engineering (ICSE), 2017.

Google Scholar

[3]

A. Deng and X. Shi, "Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned," in Conference on Knowledge Discovery and Data Mining (KDD), 2016.

Digital Library

Google Scholar

[4]

A. Deng, J. Lu and S. Chen, "Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing," in Conference on Data Science and Advanced Analytics, 2016.

Crossref

Google Scholar

[5]

P. Dmitriev and X. Wu, "Measuring Metrics," in Conference on Information and Knowledge Management (CIKM), 2016.

Digital Library

Google Scholar

[6]

W. Machmouchi and G. Buscher, "Principles for the Design of Online A/B Metrics," in ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2016.

Digital Library

Google Scholar

[7]

Z. Zhao, M. Chen, D. Matheson and M. Stone, "Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation," in Conference on Data Science and Advanced Analytics, 2016.

Crossref

Google Scholar

[8]

R. Kohavi, R. Longbotham and J. Quarto-vonTivadar, "Planning, Running, and Analyzing Controlled Experiments on the Web," in tutorial at Conference on Knowledge Discovery and Data Mining, 2009.

Google Scholar

[9]

R. Kohavi, "Pitfalls in Online Controlled Experiments," in MIT COnference on Digital Experimentation (CODE), 2016.

Google Scholar

[10]

R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker and Y. Xu, "Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained," in Conference on Knowledge Discovery and Data Mining (KDD), 2012.

Digital Library

Google Scholar

[11]

A. Deng, Y. Xu, R. Kohavi and T. Walker, "Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data," in Conference on Web Search and Data Mining (WSDM), 2013.

Digital Library

Google Scholar

[12]

R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu and N. Pohlmann, "Online Controlled Experiments at Large Scale," in Conference on Knowledge Discovery and Data Mining (KDD), 2013.

Digital Library

Google Scholar

[13]

A. Deng, "Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments," in World Wide Web Conference (WWW), 2015.

Google Scholar

[14]

A. Deng, P. Zhang, S. Chen, D. Kim and J. Lu, "Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression," in In submission, 2017.

Google Scholar

Cited By

View all

Lin LMeng CBrennan JPouget-Abadie JHan NBi SPeng Y(2024)Country-diverted experiments for mitigation of network effectsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688046(765-767)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688046
Mayr AStahmann PNebel MJaniesch C(2024)Still doing it yourself? Investigating determinants for the adoption of intelligent process automationElectronic Markets10.1007/s12525-024-00737-934:1Online publication date: 14-Nov-2024
https://doi.org/10.1007/s12525-024-00737-9
Musgrave PHan CGupta PChen HDuh WHuang HKato MMothe JPoblete B(2023)Measuring Service-Level Learning Effects in Search Via Query-Randomized ExperimentsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592020(2169-2173)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592020
Show More Cited By

Index Terms

A/B Testing at Scale: Accelerating Software Innovation
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Test collections

Recommendations

Online controlled experiments at large scale
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At ...
Network A/B Testing: From Sampling to Estimation
WWW '15: Proceedings of the 24th International Conference on World Wide Web

A/B testing, also known as bucket testing, split testing, or controlled experiment, is a standard way to evaluate user engagement or satisfaction from a new service, feature, or product. It is widely used in online websites, including social network ...
Continuous experimentation and A/B testing: a mapping study
RCoSE '18: Proceedings of the 4th International Workshop on Rapid Continuous Software Engineering

Background. Continuous experimentation (CE) has recently emerged as an established industry practice and as a research subject. Our aim is to study the application of CE and A/B testing in various industrial contexts. Objective. We wanted to investigate ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '17: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2017

1476 pages

ISBN:9781450350228

DOI:10.1145/3077136

General Chairs:
Noriko Kando
National Institute of Informatics
,
Tetsuya Sakai
Waseda University
,
Hideo Joho
University of Tsukuba
,
Program Chairs:
Hang Li
Huawei Noah's Ark Lab
,
Arjen P. de Vries
Radboud University
,
Ryen W. White
Microsoft Cortana

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '17

Sponsor:

SIGIR

SIGIR '17: The 40th International ACM SIGIR conference on research and development in Information Retrieval

August 7 - 11, 2017

Tokyo, Shinjuku, Japan

Acceptance Rates

SIGIR '17 Paper Acceptance Rate 78 of 362 submissions, 22%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
564
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)1

Reflects downloads up to 04 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Lin LMeng CBrennan JPouget-Abadie JHan NBi SPeng Y(2024)Country-diverted experiments for mitigation of network effectsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688046(765-767)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688046
Mayr AStahmann PNebel MJaniesch C(2024)Still doing it yourself? Investigating determinants for the adoption of intelligent process automationElectronic Markets10.1007/s12525-024-00737-934:1Online publication date: 14-Nov-2024
https://doi.org/10.1007/s12525-024-00737-9
Musgrave PHan CGupta PChen HDuh WHuang HKato MMothe JPoblete B(2023)Measuring Service-Level Learning Effects in Search Via Query-Randomized ExperimentsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592020(2169-2173)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592020
Chen BYang DAl Hasan MXiong L(2022)User Recommendation in Social Metaverse with VRProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557487(148-158)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557487
Gupta SShi XDmitriev PFu X(2020)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsCompanion Proceedings of the Web Conference 202010.1145/3366424.3383117(317-319)Online publication date: 20-Apr-2020
https://dl.acm.org/doi/10.1145/3366424.3383117
Gupta SShi XDmitriev PFu XMukherjee ACaverlee JHu XLalmas MWang W(2020)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 13th International Conference on Web Search and Data Mining10.1145/3336191.3371871(877-880)Online publication date: 20-Jan-2020
https://dl.acm.org/doi/10.1145/3336191.3371871
Gupta SKohavi RDeng AOmhover JJanowski P(2019)A/B Testing at Scale: Accelerating Software InnovationCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3320093(1299-1300)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308560.3320093
Shi XDmitriev PGupta SFu XTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332297(3189-3190)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3332297
Jiang SMartin JWilson C(2019)Who's the Guinea Pig?Proceedings of the Conference on Fairness, Accountability, and Transparency10.1145/3287560.3287565(201-210)Online publication date: 29-Jan-2019
https://dl.acm.org/doi/10.1145/3287560.3287565

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Online controlled experiments at large scale

Network A/B Testing: From Sampling to Estimation

Continuous experimentation and A/B testing: a mapping study