More Web Proxy on the site http://driver.im/

research-article

Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments

Authors:

Pavel Dmitriev,

Xin FuAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 317 - 319

https://doi.org/10.1145/3366424.3383117

Published: 20 April 2020 Publication History

Abstract

A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex changes like using machine learning models to personalize user experience. The key aspect of A/B testing is evaluation of experiment results. Designing the right set of metrics - correct outcome measures, data quality indicators, guardrails that prevent harm to business, and a comprehensive set of supporting metrics to understand the “why” behind the key movements is the #1 challenge practitioners face when trying to scale their experimentation program 11, 14. On the technical side, improving sensitivity of experiment metrics is a hard problem and an active research area, with large practical implications as more and more small and medium size businesses are trying to adopt A/B testing and suffer from insufficient power. In this tutorial we will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions. A version of this tutorial was also present at KDD 2019 23. It was attended by around 150 participants. This tutorial has also been accepted for the WSDM 2020 conference.

References

[1]

A/B testing at scale: Accelerating software innovation: Big data conference & machine learning training | Strata Data: 2018. https://conferences.oreilly.com/strata/strata-ca-2018/public/schedule/detail/63322. Accessed: 2019-02-18.

[2]

Advanced Topics in Online Experiments – ExP Platform: https://exp-platform.com/advanced-topics-in-online-experiments/. Accessed: 2019-09-09.

[3]

Deng, A. 2017. A/B testing at scale: Accelerating software innovation. SIGIR 2017 - Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (2017).

Digital Library

[4]

Deng, A. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. Proceedings of the sixth ACM international conference on Web search and data mining - WSDM ’13 (New York, New York, USA, 2013), 123.

Digital Library

[5]

Deng, A. and Shi, X. 2016. Data-Driven Metric Development for Online Controlled Experiments. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 (New York, New York, USA, 2016), 77–86.

[6]

Deng, A. and Shi, X. 2016. Data-Driven Metric Development for Online Controlled Experiments. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 (2016), 77–86.

[7]

Dmitriev, P. 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments. Proceedings of the 23rd ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’17 (Halifax, Nova Scotia, Canada, 2017).

Digital Library

[8]

Dmitriev, P. 2016. Pitfalls of long-term online controlled experiments. 2016 IEEE International Conference on Big Data (Big Data) (Washington, DC, USA, Dec. 2016), 1367–1376.

[9]

Dmitriev, P. and Wu, X. 2016. Measuring Metrics. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM ’16 (2016), 429–437.

[10]

Fabijan, A. 2019. Diagnosing Sample Ratio Mismatch in Online Controlled Experiments. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD ’19 (New York, New York, USA, 2019), 2156–2164.

Digital Library

[11]

Fabijan, A. 2019. Three Key Checklists and Remedies for Trustworthy Analysis of Online Controlled Experiments at Scale. to appear in the proceedings of 2019 IEEE/ACM 39th International Conference on Software Engineering (ICSE) Software Engineering in Practice (SEIP) (Montreal, Canada, 2019).

[12]

Gupchup, J. 2018. Trustworthy Experimentation Under Telemetry Loss. to appear in: Proceedings of the 27th ACM International on Conference on Information and Knowledge Management - CIKM ’18 (Lingotto, Turin, 2018).

[13]

Gupta, S. 2019. A/B Testing at Scale: Accelerating Software Innovation. Companion Proceedings of The 2019 World Wide Web Conference on - WWW ’19 (New York, New York, USA, 2019), 1299–1300.

Digital Library

[14]

Gupta, S. 2019. Top Challenges from the first Practical Online Controlled Experiments Summit. ACM SIGKDD Explorations Newsletter. 21, 1 (May 2019), 20–35.

Digital Library

[15]

Hassan, A. 2013. Beyond clicks. Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM ’13 (New York, New York, USA, 2013), 2019–2028.

Digital Library

[16]

Kohavi, R. 2014. Seven rules of thumb for web site experimenters. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14 (New York, USA, 2014), 1857–1866.

Digital Library

[17]

Kohavi, R. 2012. Trustworthy online controlled experiments: Five Puzzling Outcomes Explained. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’12 (New York, New York, USA, 2012), 786.

Digital Library

[18]

Machmouchi, W. 2017. Beyond Success Rate. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management - CIKM ’17 (New York, New York, USA, 2017), 757–765.

Digital Library

[19]

Machmouchi, W. and Buscher, G. 2016. Principles for the Design of Online A/B Metrics. Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR ’16 (New York, New York, USA, 2016), 589–590.

Digital Library

[20]

Poyarkov, A. 2016. Boosted Decision Tree Regression Adjustment for Variance Reduction in Online Controlled Experiments. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. (2016), 235–244.

Digital Library

[21]

Rodden, K. 2010. Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. (2010), 2395–2398.

Digital Library

[22]

Rodden, K. 2010. Measuring the user experience on a large scale. Proceedings of the 28th international conference on Human factors in computing systems - CHI ’10 (New York, New York, USA, 2010), 2395.

Digital Library

[23]

Shi, X. 2019. Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD ’19 (New York, New York, USA, 2019), 3189–3190.

Digital Library

[24]

Xie, H. and Aurisset, J. 2016. Improving the Sensitivity of Online Controlled Experiments. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16. (2016), 645–654.

[25]

Zhao, Z. 2016. Online Experimentation Diagnosis and Troubleshooting Beyond AA Validation. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (Oct. 2016), 498–507.

Index Terms

Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
1. General and reference
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation
      1. Software defect analysis
        Software testing and debugging

Index terms have been assigned to the content through auto-classification.

Recommendations

Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex ...
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments
WSDM '20: Proceedings of the 13th International Conference on Web Search and Data Mining

A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex ...
Software metrics: pitfalls and best practices
ICSE '13: Proceedings of the 2013 International Conference on Software Engineering

Using software metrics to keep track of the progress and quality of products and processes is a common practice in industry. Additionally, designing, validating and improving metrics is an important research area. Although using software metrics can ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
64
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents