More Web Proxy on the site http://driver.im/

research-article

Experimentation in the operating system: the windows experimentation platform

Authors:

Pavel Dmitriev,

Huibin Mary Hu,

Brandon Paddock,

Alex Kirshenbaum,

Irina Niculescu,

Taj ThoresenAuthors Info & Claims

ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice

Pages 21 - 30

https://doi.org/10.1109/ICSE-SEIP.2019.00011

Published: 27 May 2019 Publication History

Abstract

Online controlled experiments are the gold standard for evaluating improvements and accelerating innovations in online and app worlds. However, little is known about applicability, implementation, and efficacy of experimentation for operating systems (OS), where many features are non-user-facing. In this paper, we present the Windows Experimentation platform (WExp), and insights from implementation and execution of real-world experiments in the OS. We start by discussing the need for experimentation in OS, using real experiments to illustrate the benefits. We then describe the architecture of WExp, focusing on unique considerations in its engineering. Finally, we discuss learnings and challenges from conducting real-world experiments. Our experiences and insights can motivate practitioners to start experimenting as well as to help them to successfully build their experimentation platforms. The learnings can also guide experimenters with best-practices and highlight promising avenues for future research.

References

[1]

D. Tang, A. Agarwal, D. O. Brien, and M. Meyer, "Overlapping Experiment Infrastructure: More, Better, Faster Experimentation," in Proc SIGKDD '10, 2010, pp. 17--26.

Digital Library

[2]

R. Kohavi, B. Frasca, T. Crook, R. Henne, and R. Longbotham, "Online experimentation at Microsoft," in Prcoeedings of the Workshop on Data Mining Case Studies and Practice, 2009.

[3]

R. Kohavi and M. Round, "Front Line Internet Analytics at Amazon.com," eMetrics Summit, 2004. {Online}. Available: http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf.

[4]

R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker, and Y. Xu, "Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained," in Proc SIGKDD '12, 2012, pp. 786--794.

Digital Library

[5]

T. Whitney, M. Satran, M. Jacobs, and D. Das, "What's a Universal Windows Platform (UWP) App?," Windows Docs, 2018. {Online}. Available: https://docs.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide.

[6]

R-core, "Student's T-Test," RDocumentation. {Online}. Available: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/t.test.

[7]

D. Spinellis, "A Tale of Four Kernels," in Proc ICSE '08, 2008, pp. 381--390.

Digital Library

[8]

W. Vogels, "File System Usage in Windows NT 4.0," in Proc SOSP '99, 1999, pp. 93--109.

Digital Library

[9]

K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt, "Debugging in the (very) large: ten years of implementation and experience," in Proc SOSP '09, 2009, pp. 103--116.

Digital Library

[10]

L. Dorrendorf, Z. Gutterman, and B. Pinkas, "Cryptanalysis of the Random Number Generator of the Windows Operating System," TISSEC, vol. 13, no. 1, 2009.

[11]

S. Narayan, S. S. Kolahi, Y. Sunarto, D. D. T. Nguyen, and P. Mani, "Performance Comparison of IPv4 and IPv6 on Various Windows Operating Systems," in Proc ICCIT '08, 2008, pp. 663--668.

[12]

E. Cota-Robles and J. P. Held, "A Comparison of Windows Driver Model Latency Performance on Windows NT and Windows 98," in Proc SOSP '99, 1999, pp. 159--172.

[13]

S. Zhang, L. Wang, R. Zhang, and Q. Guo, "Exploratory Study on Memory Analysis of Windows 7 Operating System," in Proc ICACTE '10, 2010.

[14]

C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy, "Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista," in Proc ICSE '09, 2009, pp. 518--528.

[15]

J. Erickson, M. Musuvathi, S. Burckhardt, and K. Olynyk, "Effective Data-Race Detection for the Kernel," in Proc OSDI '10, 2010, pp. 151--162.

Digital Library

[16]

S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie, "Performance Debugging in the Large via Mining Millions of Stack Traces," in Proc ICSE '12, 2012, pp. 145--155.

Digital Library

[17]

P. L. Li, R. Kivett, Z. Zhan, S. Jeon, N. Nagappan, B. Murphy, and A. J. Ko, "Characterizing the differences between pre- and post-release versions of software," Proc ICSE '11, pp. 716--725, 2011.

Digital Library

[18]

A. Deng, Y. Xu, R. Kohavi, and T. Walker, "Improving the sensitivity of online controlled experiments by utilizing pre-experiment data," in Proc WSDM '13, 2013, p. 123.

[19]

B. Ding, H. Nori, P. Li, and J. Allen, "Comparing Population Means under Local Differential Privacy: with Significance and Power," in Proc AAAI '18, 2018.

[20]

P. Dmitriev and X. Wu, "Measuring Metrics," Proc CIKM '16, pp. 429--437, 2016.

Digital Library

[21]

P. Dmitriev, S. Gupta, K. Dong Woo, and G. Vaz, "A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments," Proc KDD '17, pp. 1427--1436, 2017.

Digital Library

[22]

H. Hohnhold, D. O'Brien, and D. Tang, "Focusing on the Long-term," in Proc KDD '15, 2015, pp. 1849--1858.

Digital Library

[23]

P. Dmitriev, B. Frasca, S. Gupta, R. Kohavi, and G. Vaz, "Pitfalls of long-term online controlled experiments," in Proc Big Data '16, 2016, pp. 1367--1376.

[24]

R. Kohavi and S. Thomke, "The Surprising Power of Online Experiments," Harv. Bus. Rev., vol. 95, no. 5, p. 74, 2017.

[25]

A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "The Benefits of Controlled Experimentation at Scale," in Proc SEAA'17, 2017, pp. 18--26.

[26]

Y. Xu, N. Chen, A. Fernandez, O. Sinno, and A. Bhasin, "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks," in Proc KDD'15, 2015, pp. 2227--2236.

Digital Library

[27]

S. Gupta, L. Ulanova, S. Bhardwaj, P. Dmitriev, P. Raff, and A. Fabijan, "The Anatomy of a Large-Scale Online Experimentation Platform," in Proc ICSA '18, 2018.

[28]

R. Kohavi, A. Deng, R. Longbotham, and Y. Xu, "Seven Rules of Thumb for Web Site Experimenters," in Proc KDD '14, 2014, pp. 1857--1866.

Digital Library

[29]

R. Kohavi and R. Longbotham, "Online Experiments: Lessons Learned," Computer (Long. Beach. Calif)., vol. 40, no. 9, pp. 103--105, 2007.

Digital Library

[30]

P. Runeson and M. Höst, "Guidelines for conducting and reporting case study research in software engineering," Empir. Softw. Eng., vol. 14, no. 2, pp. 131--164, 2009.

Digital Library

[31]

P. L. Li, M. Ni, S. Xue, J. P. Mullally, M. Garzia, and M. Khambatti, "Reliability assessment of mass-market software: insights from Windows Vista®," Proc ISSRE '08, pp. 265--270, Nov. 2008.

Digital Library

[32]

T. Myerson, "An Update on What's Coming Next for Windows Insiders," Windows Insider, 2017. {Online}. Available: https://insider.windows.com/en-us/articles/update-whats-coming-next-windows-insiders/.

[33]

R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, and N. Pohlmann, "Online controlled experiments at large scale," Proc KDD '13, p. 1168, 2013.

[34]

P. Bright, "Windows 7, 8.1 Moving to Windows 10's Cumulative Update Model," arstechnia.com, 2016. {Online}. Available: https://arstechnica.com/information-technology/2016/08/windows-7-8-1-moving-to-windows-10s-cumulative-update-model/.

[35]

S. Radhakrishnan, Y. Cheng, J. Chu, A. Jain, and B. Raghavan, "TCP Fast Open," in Proc CoNEXT '11, 2011.

[36]

D. Halfin and B. Lich, "Build Deployment Rings for Windows 10 Updates," Windows Docs, 2017. {Online}. Available: https://docs.microsoft.com/en-us/windows/deployment/update/waas-deployment-rings-windows-10-updates.

[37]

E. Bott, "Windows 10 Telemetry Secrets: Where, When, and Why Microsoft Collects Your Data," ZDNet, 2016. {Online}. Available: https://www.zdnet.com/article/windows-10-telemetry-secrets/.

[38]

A. Deng and X. Shi, "Data-Driven Metric Development for Online Controlled Experiments," in Proc KDD '16, 2016, pp. 77--86.

Digital Library

[39]

S. Sawaya, "How Windows Insider Feedback Influences Windows 10 Development," Windows Blogs, 2015. {Online}. Available: https://blogs.windows.com/windowsexperience/2015/06/12/how-windows-insider-feedback-influences-windows-10-development/.

[40]

K. Kniskern, "Looks like Microsoft is testing new icons for Edge Hub," OnMSFT, 2017. {Online}. Available: https://www.onmsft.com/news/looks-like-microsoft-is-testing-new-icons-for-edge-feedback-hub.

[41]

J. Gupchup, Y. Hosseinkashi, P. Dmitriev, D. Schneider, R. Cutler, A. Jefremov, and M. Ellis, "Trustworthy Experimentation Under Telemetry Loss," in Proc CIKM'18, 2018, pp. 387--396.

Digital Library

Cited By

Quin FWeyns DBaresi LMa XPasquale L(2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643915.3644087
Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Fabijan ADmitriev PArai BDrake AKohlmeier SKwong ALiu AMuccini H(2023)A/B Integrations: 7 Lessons Learned from Enabling A/B Testing as a Product FeatureProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00033(304-314)Online publication date: 17-May-2023
https://dl.acm.org/doi/10.1109/ICSE-SEIP58684.2023.00033
Show More Cited By

Recommendations

Three key checklists and remedies for trustworthy analysis of online controlled experiments at scale
ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice

Online Controlled Experiments (OCEs) are transforming the decision-making process of data-driven companies into an experimental laboratory. Despite their great power in identifying what customers actually value, experimentation is very sensitive to data ...
Challenges in applying continuous experimentation: a practitioners' perspective
ICSE-SEIP '22: Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice

Background: Applying Continuous Experimentation on a large scale is not easily achieved. Although the evolution within large tech organisations is well understood, we still lack a good understanding of how to transition a company towards applying more ...
LASER: a living analytics experimentation system for large-scale online controlled experiments
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

Tracking user browsing data and measuring the effectiveness of website design and web services are important to businesses that want to attract the consumers today who spend much more time online than before. Instead of using randomized controlled ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice

May 2019

339 pages

Conference Chairs:
Helen Sharp
The Open University, UK
,
Mike Whalen
Amazon Inc.

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering
IEEE-CS: Computer Society

Publisher

IEEE Press

Publication History

Published: 27 May 2019

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '19

Sponsor:

SIGSOFT
IEEE-CS

ICSE '19: 41st International Conference on Software Engineering

May 27, 2019

Quebec, Montreal, Canada

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
91
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Quin FWeyns DBaresi LMa XPasquale L(2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643915.3644087
Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Fabijan ADmitriev PArai BDrake AKohlmeier SKwong ALiu AMuccini H(2023)A/B Integrations: 7 Lessons Learned from Enabling A/B Testing as a Product FeatureProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00033(304-314)Online publication date: 17-May-2023
https://dl.acm.org/doi/10.1109/ICSE-SEIP58684.2023.00033
Xie YXu MChow EShi XLewin-Eytan LCarmel DYom-Tov EAgichtein EGabrilovich E(2021)How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled ExperimentsProceedings of the 14th ACM International Conference on Web Search and Data Mining10.1145/3437963.3441742(949-957)Online publication date: 8-Mar-2021
https://dl.acm.org/doi/10.1145/3437963.3441742
Li PChai XCampbell FLiao JAbburu NKang MNiculescu IBrake GPatil SDooley JPaddock BEldh SFalessi D(2021)Evolving software to be ML-driven utilizing real-world A/B testingProceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP52600.2021.00026(170-179)Online publication date: 25-May-2021
https://dl.acm.org/doi/10.1109/ICSE-SEIP52600.2021.00026

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents