More Web Proxy on the site http://driver.im/

research-article

Trustworthy Experimentation Under Telemetry Loss

Authors:

Jayant Gupchup,

Yasaman Hosseinkashi,

Pavel Dmitriev,

Daniel Schneider,

Andrei Jefremov,

Martin EllisAuthors Info & Claims

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 387 - 396

https://doi.org/10.1145/3269206.3271747

Published: 17 October 2018 Publication History

Abstract

Failure to accurately measure the outcomes of an experiment can lead to bias and incorrect conclusions. Online controlled experiments (aka AB tests) are increasingly being used to make decisions to improve websites as well as mobile and desktop applications. We argue that loss of telemetry data (during upload or post-processing) can skew the results of experiments, leading to loss of statistical power and inaccurate or erroneous conclusions. By systematically investigating the causes of telemetry loss, we argue that it is not practical to entirely eliminate it. Consequently, experimentation systems need to be robust to its effects. Furthermore, we note that it is nontrivial to measure the absolute level of telemetry loss in an experimentation system. In this paper, we take a top-down approach towards solving this problem. We motivate the impact of loss qualitatively using experiments in real applications deployed at scale, and formalize the problem by presenting a theoretical breakdown of the bias introduced by loss. Based on this foundation, we present a general framework for quantitatively evaluating the impact of telemetry loss, and present two solutions to measure the absolute levels of loss. This framework is used by well-known applications at Microsoft, with millions of users and billions of sessions. These general principles can be adopted by any application to improve the overall trustworthiness of experimentation and data-driven decision making.

References

[1]

Fabio Celli et almbox. 2016. Predicting Brexit: Classifying agreement is better than sentiment and pollsters. In Proc. Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media .

[2]

Ronnie Chaiken et almbox. 2008. SCOPE: easy and efficient parallel processing of massive data sets. Proc. VLDB Endowment, Vol. 1, 2 (2008).

Digital Library

[3]

Alex Deng et almbox. 2013. Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data. In Proc. Conference on Web Search and Data Mining .

Digital Library

[4]

Pavel Dmitriev et almbox. 2017a. A/B Testing at Scale: Accelerating Software Innovation. In Proc. ACM KDD '17 .

[5]

Pavel Dmitriev et almbox. 2017b. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments. In Proc. ACM KDD '18 .

Digital Library

[6]

Aleksander Fabijan et almbox. 2017. The Benefits of Controlled Experimentation at Scale. In Proc. Euromicro Conference on Software Engineering and Advanced Analytics .

[7]

Jim Gray et almbox. 1996. The dangers of replication and a solution. ACM SIGMOD Record, Vol. 25, 2 (1996).

Digital Library

[8]

Kosuke Imai. 2009. Statistical analysis of randomized experiments with non-ignorable missing binary outcomes: an application to a voting experiment. Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 58, 1 (2009).

[9]

Junchen Jiang et almbox. 2016. Via: Improving internet telephony call quality using predictive relay selection. In Proc. ACM SIGCOMM '16 .

Digital Library

[10]

Scott Keeter. 2006. The impact of cell phone noncoverage bias on polling in the 2004 presidential election. Public Opinion Quarterly, Vol. 70, 1 (2006).

[11]

Ron Kohavi et almbox. 2009. Controlled experiments on the web: survey and practical guide. In Proc. ACM KDD '09 .

[12]

Ron Kohavi et almbox. 2013. Seven Rules of Thumb for Web Site Experimenters. In Proc. ACM KDD '13 .

Digital Library

[13]

Ron Kohavi and Stefan Thomke. 2017. The surprising power of online experiments. Harvard Business Review, Vol. 95, 5 (2017).

[14]

Adam Langley et almbox. 2017. The QUIC transport protocol: Design and Internet-scale deployment. In Proc. ACM SIGCOMM '17 .

Digital Library

[15]

Roderick JA Little and Donald B Rubin. 2014. Statistical analysis with missing data . Vol. 333. John Wiley & Sons.

[16]

Francesca Molinari. 2010. Missing Treatments. Journal of Business and Economic Statistics, Vol. 28, 1 (2010).

[17]

Douglas C. Montgomery. 2008. Design and Analysis of Experiments .John Wiley & Sons.

Digital Library

[18]

H Schulzrinne et almbox. 2003. RTP: A Transport Protocol for Real-Time Applications . Internet RFCs, Vol. RFC 3550 (2003).

Digital Library

[19]

Diane Tang et almbox. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proc. ACM KDD '10 .

Digital Library

[20]

Ashish Thusoo et almbox. 2009. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endowment, Vol. 2, 2 (2009).

Digital Library

[21]

Matthias Wiesmann et almbox. 2000. Database replication techniques: A three parameter classification. In Proc. IEEE Symposium on Reliable Distributed Systems .

Digital Library

[22]

Ya Xu et almbox. 2015. From infrastructure to culture: A/B testing challenges in large scale social networks. In Proc. ACM KDD '15 .

Digital Library

[23]

Ya Xu and Nanyu Chen. 2016. Evaluating Mobile Apps with A/B and Quasi A/B Tests. In Proc. ACM KDD '16 .

Digital Library

Cited By

Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Power CPatel HJindal ALeeka JJenkins BRys MTriou EZhu DKatahanas LTalapady CRowe JZhang FDraves RFriedman MFilho IKumar A(2021)The cosmos big data platform at MicrosoftProceedings of the VLDB Endowment10.14778/3476311.347639014:12(3148-3161)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.14778/3476311.3476390
Shi XDmitriev PGupta SFu XTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332297(3189-3190)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3332297
Show More Cited By

Index Terms

Trustworthy Experimentation Under Telemetry Loss

Recommendations

Experimentation in the operating system: the windows experimentation platform
ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice

Online controlled experiments are the gold standard for evaluating improvements and accelerating innovations in online and app worlds. However, little is known about applicability, implementation, and efficacy of experimentation for operating systems (...
Challenges in applying continuous experimentation: a practitioners' perspective
ICSE-SEIP '22: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice

Background: Applying Continuous Experimentation on a large scale is not easily achieved. Although the evolution within large tech organisations is well understood, we still lack a good understanding of how to transition a company towards applying more ...
Three key checklists and remedies for trustworthy analysis of online controlled experiments at scale
ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice

Online Controlled Experiments (OCEs) are transforming the decision-making process of data-driven companies into an experimental laboratory. Despite their great power in identifying what customers actually value, experimentation is very sensitive to data ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

October 2018

2362 pages

ISBN:9781450360142

DOI:10.1145/3269206

General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '18

Sponsor:

CIKM '18: The 27th ACM International Conference on Information and Knowledge Management

October 22 - 26, 2018

Torino, Italy

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
231
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Power CPatel HJindal ALeeka JJenkins BRys MTriou EZhu DKatahanas LTalapady CRowe JZhang FDraves RFriedman MFilho IKumar A(2021)The cosmos big data platform at MicrosoftProceedings of the VLDB Endowment10.14778/3476311.347639014:12(3148-3161)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.14778/3476311.3476390
Shi XDmitriev PGupta SFu XTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled ExperimentsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332297(3189-3190)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3332297
Fabijan AGupchup JGupta SOmhover JQin WVermeer LDmitriev PTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)Diagnosing Sample Ratio Mismatch in Online Controlled ExperimentsProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330722(2156-2164)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330722
Li PDmitriev PHu HChai XDimov ZPaddock BLi YKirshenbaum ANiculescu IThoresen TSharp HWhalen M(2019)Experimentation in the operating systemProceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP.2019.00011(21-30)Online publication date: 27-May-2019
https://dl.acm.org/doi/10.1109/ICSE-SEIP.2019.00011
Fabijan ADmitriev POlsson HBosch JVermeer LLewis DSharp HWhalen M(2019)Three key checklists and remedies for trustworthy analysis of online controlled experiments at scaleProceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP.2019.00009(1-10)Online publication date: 27-May-2019
https://dl.acm.org/doi/10.1109/ICSE-SEIP.2019.00009
Alam MGerostathopoulos IPrehofer CAttanasi ABures T(2019)A Framework for Tunable Anomaly Detection2019 IEEE International Conference on Software Architecture (ICSA)10.1109/ICSA.2019.00029(201-210)Online publication date: Mar-2019
https://doi.org/10.1109/ICSA.2019.00029

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents