[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICSE-SEIP.2019.00009acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Three key checklists and remedies for trustworthy analysis of online controlled experiments at scale

Published: 27 May 2019 Publication History

Abstract

Online Controlled Experiments (OCEs) are transforming the decision-making process of data-driven companies into an experimental laboratory. Despite their great power in identifying what customers actually value, experimentation is very sensitive to data loss, skipped checks, wrong designs, and many other 'hiccups' in the analysis process. For this purpose, experiment analysis has traditionally been done by experienced data analysts and scientists that closely monitored experiments throughout their lifecycle. Depending solely on scarce experts, however, is neither scalable nor bulletproof. To democratize experimentation, analysis should be streamlined and meticulously performed by engineers, managers, or others responsible for the development of a product. In this paper, based on synthesized experience of companies that run thousands of OCEs per year, we examined how experts inspect online experiments. We reveal that most of the experiment analysis happens before OCEs are even started, and we summarize the key analysis steps in three checklists. The value of the checklists is threefold. First, they can increase the accuracy of experiment se-tup and decision-making process. Second, checklists can enable novice data scientists and software engineers to become more autonomous in setting-up and analyzing experiments. Finally, they can serve as a base to develop trustworthy platforms and tools for OCE set-up and analysis.

References

[1]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "Online Controlled Experimentation at Scale: An Empirical Survey on the Current State of A/B Testing," in Proceedings of the 2018 44rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2018.
[2]
E. Lindgren and J. Münch, "Software development as an experiment system: A qualitative survey on the state of the practice," in Lecture Notes in Business Information Processing, 2015, vol. 212, pp. 117--128.
[3]
F. Auer and M. Felderer, "Current State of Continuous Experimentation: A Systematic Mapping Study," in Proceedings of the 2018 44rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2018.
[4]
R. Kohavi and S. Thomke, "The Surprising Power of Online Experiments," Harvard Business Review, no. October, 2017.
[5]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "The Benefits of Controlled Experimentation at Scale," in Proceedings of the 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2017, pp. 18--26.
[6]
A. Deng, P. Zhang, S. Chen, D. W. Kim, and J. Lu, "Concise Summarization of Heterogeneous Treatment Effect Using Total Variation Regularized Regression," Submiss., Oct. 2016.
[7]
P. Dmitriev, S. Gupta, K. Dong Woo, and G. Vaz, "A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments," in Proceedings of the 23rd ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '17, 2017.
[8]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "Effective Online Experiment Analysis at Large Scale," in Proceedings of the 2018 44rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2018.
[9]
R. S. Kaplan and D. P. Norton, "The Balanced Scorecard: Translating Strategy Into Action," Harvard Business School Press. pp. 1--311, 1996.
[10]
S. Gupta, L. Ulanova, S. Bhardwaj, P. Dmitriev, P. Raff, and A. Fabijan, "The Anatomy of a Large-Scale Experimentation Platform," in 2018 IEEE International Conference on Software Architecture (ICSA), 2018, no. May, pp. 1--109.
[11]
T. Kluck and L. Vermeer, "Leaky Abstraction In Online Experimentation Platforms: A Conceptual Framework To Categorize Common Challenges," Oct. 2017.
[12]
D. I. Mattos, J. Bosch, and H. H. Olsson, "Challenges and Strategies for Undertaking Continuous Experimentation to Embedded Systems: Industry and Research Perspectives," in 19th International Conference on Agile Software Development, XP'18, 2018, no. March, pp. 1--15.
[13]
M. Kim, T. Zimmermann, R. DeLine, and A. Begel, "The emerging role of data scientists on software development teams," in Proceedings of the 38th International Conference on Software Engineering - ICSE '16, 2016, no. MSR-TR-2015-30, pp. 96--107.
[14]
S. Miller and D. Hughes, "The Quant Crunch: How the Demand For Data Science Skills is Disrupting the Job Market," Burning Glass Technologies, 2017.
[15]
G. Schermann, J. J. Cito, and P. Leitner, "Continuous Experimentation: Challenges, Implementation Techniques, and Current Research," IEEE Softw., vol. 35, no. 2, pp. 26--31, Mar. 2018.
[16]
E. Ries, The Lean Startup: How Today's Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses. 2011.
[17]
S. Blank, "Why the lean start-up changes everything," Harvard Business Review, vol. 91, no. 5. John Wiley & Sons, p. 288, 2013.
[18]
J. Humble and D. Farley, Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation. 2010.
[19]
D. G. Feitelson, E. Frachtenberg, and K. L. Beck, "Development and deployment at facebook," IEEE Internet Comput., vol. 17, no. 4, pp. 8--17, 2013.
[20]
J. F. Box, "R.A. Fisher and the Design of Experiments, 1922--1926," Am. Stat., vol. 34, no. 1, pp. 1--7, Feb. 1980.
[21]
S. D. Simon, "Is the randomized clinical trial the gold standard of research?," J. Androl., vol. 22, no. 6, pp. 938--943, Nov. 2001.
[22]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "The Experiment Lifecycle," Accept. to Appear IEEE Softw., 2018.
[23]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "The Evolution of Continuous Experimentation in Software Product Development," in Proceedings of the 39th International Conference on Software Engineering ICSE'17, 2017.
[24]
K. Kevic, B. Murphy, L. Williams, and J. Beckmann, "Characterizing Experimentation in Continuous Deployment: A Case Study on Bing," in 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP), 2017, pp. 123--132.
[25]
F. Fagerholm, A. S. Guinea, H. Mäenpää, and J. Münch, "The RIGHT model for Continuous Experimentation," J. Syst. Softw., vol. 0, pp. 1--14, 2015.
[26]
R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker, and Y. Xu, "Trustworthy online controlled experiments," in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12, 2012, p. 786.
[27]
R. L. Kaufman, J. Pitchforth, and L. Vermeer, "Democratizing online controlled experiments at Booking.com," arXiv Prepr. arXiv1710.08217, pp. 1--7, 2017.
[28]
Y. Xu, N. Chen, A. Fernandez, O. Sinno, and A. Bhasin, "From Infrastructure to Culture," in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15, 2015, pp. 2227--2236.
[29]
D. Tang, A. Agarwal, D. O. Brien, M. Meyer, D. O'Brien, and M. Meyer, "Overlapping experiment infrastructure," in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10, 2010, p. 17.
[30]
A. Fabijan, P. Dmitriev, C. McFarland, L. Vermeer, H. Holmström Olsson, and J. Bosch, "Experimentation growth: Evolving trustworthy A/B testing capabilities in online software companies," J. Softw. Evol. Process, p. e2113, Nov. 2018.
[31]
R. Power and B. Williams, "Checklists for improving rigour in qualitative research," BMJ, vol. 323, no. 7311, pp. 514--514, Sep. 2001.
[32]
D. Moher, A. R. Jadad, G. Nichol, M. Penman, P. Tugwell, and S. Walsh, "Assessing the quality of randomized controlled trials: An annotated bibliography of scales and checklists," Control. Clin. Trials, vol. 16, no. 1, pp. 62--73, 1995.
[33]
A. Gawande, Checklist manifesto, the (HB). Penguin Books India, 2010.
[34]
P. Runeson and M. Höst, "Guidelines for conducting and reporting case study research in software engineering," Empir. Softw. Eng., vol. 14, no. 2, pp. 131--164, 2008.
[35]
A. Fabijan and P. Dmitriev, "Experiment Analysis Questionaire," 2018. {Online}. Available: http://www.fabijan.info/papers/ICSE_ExP_Analysis_Questionnaire.pdf.
[36]
"Hypothesis Kit for A/B testing." {Online}. Available: http://www.experimentationhub.com/hypothesis-kit.html.
[37]
R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne, "Controlled experiments on the web: survey and practical guide," Data Min. Knowl. Discov., vol. 18, no. 1, pp. 140--181, Feb. 2009.
[38]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "The Evolution of Continuous Experimentation in Software Product Development: From Data to a Data-Driven Organization at Scale," in 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), 2017, pp. 770--780.
[39]
K. Rodden, H. Hutchinson, and X. Fu, "Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications," Proc. SIGCHI Conf. Hum. Factors Comput. Syst., pp. 2395--2398, 2010.
[40]
D. Yuan, S. Park, and Y. Zhou, "Characterizing logging practices in open-source software," in Proceedings - International Conference on Software Engineering, 2012, pp. 102--112.
[41]
T. Barik, R. DeLine, S. Drucker, and D. Fisher, "The bones of the system," in Proceedings of the 38th International Conference on Software Engineering Companion - ICSE '16, 2016, pp. 92--101.
[42]
D. W. Hubbard, How to measure anything: Finding the value of intangibles in business. John Wiley & Sons, 2014.
[43]
R. B. Bausell and Y.-F. Li, Power analysis for experimental research: a practical guide for the biological, medical and social sciences. Cambridge University Press, 2002.
[44]
J. Gupchup et al., "Trustworthy Experimentation Under Telemetry Loss," in to appear in: Proceedings of the 27th ACM International on Conference on Information and Knowledge Management - CIKM '18, 2018.
[45]
R. Kohavi, B. Frasca, T. Crook, R. Henne, and R. Longbotham, "Online experimentation at Microsoft," in Workshop on Data Mining Case Studies and Practice, 2009.
[46]
T. Crook, B. Frasca, R. Kohavi, and R. Longbotham, "Seven pitfalls to avoid when running controlled experiments on the web," in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09, 2009, p. 1105.
[47]
A. Deng, J. Lu, and S. Chen, "Continuous Monitoring of A/B Tests without Pain: Optional Stopping in Bayesian Testing," in 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2016, pp. 243--252.
[48]
J. M. Hoenig and D. M. Heisey, "The abuse of power: The pervasive fallacy of power calculations for data analysis," Am. Stat., vol. 55, no. 1, pp. 19--24, 2001.
[49]
R. Kohavi, A. Deng, R. Longbotham, and Y. Xu, "Seven rules of thumb for web site experimenters," in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14, 2014, pp. 1857--1866.
[50]
K. V. Desouza, "Intrapreneurship - Managing ideas within your organization," Technol. Forecast. Soc. Change, vol. 91, pp. 352--353, 2014.
[51]
P. Dmitriev and X. Wu, "Measuring Metrics," in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM '16, 2016, pp. 429--437.
[52]
A. Deng, J. Lu, and J. Litz, "Trustworthy Analysis of Online A/B Tests," Proc. Tenth ACM Int. Conf. Web Search Data Min. - WSDM '17, pp. 641--649, 2017.

Cited By

View all
  • (2022)Automated Sample Ratio Mismatch (SRM) detection and analysisProceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering10.1145/3530019.3534982(268-269)Online publication date: 13-Jun-2022
  • (2021)Important Experimentation CharacteristicsProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3475716.3484186(1-6)Online publication date: 11-Oct-2021
  • (2020)Engineering for a science-centric experimentation platformProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice10.1145/3377813.3381349(191-200)Online publication date: 27-Jun-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice
May 2019
339 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 27 May 2019

Check for updates

Author Tags

  1. a/b testing
  2. experiment checklists
  3. online controlled experiments

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Automated Sample Ratio Mismatch (SRM) detection and analysisProceedings of the 26th International Conference on Evaluation and Assessment in Software Engineering10.1145/3530019.3534982(268-269)Online publication date: 13-Jun-2022
  • (2021)Important Experimentation CharacteristicsProceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)10.1145/3475716.3484186(1-6)Online publication date: 11-Oct-2021
  • (2020)Engineering for a science-centric experimentation platformProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice10.1145/3377813.3381349(191-200)Online publication date: 27-Jun-2020
  • (2019)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332297(3189-3190)Online publication date: 25-Jul-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media