[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/ICSE-SEIP.2019.00011acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Experimentation in the operating system: the windows experimentation platform

Published: 27 May 2019 Publication History

Abstract

Online controlled experiments are the gold standard for evaluating improvements and accelerating innovations in online and app worlds. However, little is known about applicability, implementation, and efficacy of experimentation for operating systems (OS), where many features are non-user-facing. In this paper, we present the Windows Experimentation platform (WExp), and insights from implementation and execution of real-world experiments in the OS. We start by discussing the need for experimentation in OS, using real experiments to illustrate the benefits. We then describe the architecture of WExp, focusing on unique considerations in its engineering. Finally, we discuss learnings and challenges from conducting real-world experiments. Our experiences and insights can motivate practitioners to start experimenting as well as to help them to successfully build their experimentation platforms. The learnings can also guide experimenters with best-practices and highlight promising avenues for future research.

References

[1]
D. Tang, A. Agarwal, D. O. Brien, and M. Meyer, "Overlapping Experiment Infrastructure: More, Better, Faster Experimentation," in Proc SIGKDD '10, 2010, pp. 17--26.
[2]
R. Kohavi, B. Frasca, T. Crook, R. Henne, and R. Longbotham, "Online experimentation at Microsoft," in Prcoeedings of the Workshop on Data Mining Case Studies and Practice, 2009.
[3]
R. Kohavi and M. Round, "Front Line Internet Analytics at Amazon.com," eMetrics Summit, 2004. {Online}. Available: http://ai.stanford.edu/~ronnyk/emetricsAmazon.pdf.
[4]
R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker, and Y. Xu, "Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained," in Proc SIGKDD '12, 2012, pp. 786--794.
[5]
T. Whitney, M. Satran, M. Jacobs, and D. Das, "What's a Universal Windows Platform (UWP) App?," Windows Docs, 2018. {Online}. Available: https://docs.microsoft.com/en-us/windows/uwp/get-started/universal-application-platform-guide.
[6]
R-core, "Student's T-Test," RDocumentation. {Online}. Available: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/t.test.
[7]
D. Spinellis, "A Tale of Four Kernels," in Proc ICSE '08, 2008, pp. 381--390.
[8]
W. Vogels, "File System Usage in Windows NT 4.0," in Proc SOSP '99, 1999, pp. 93--109.
[9]
K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt, "Debugging in the (very) large: ten years of implementation and experience," in Proc SOSP '09, 2009, pp. 103--116.
[10]
L. Dorrendorf, Z. Gutterman, and B. Pinkas, "Cryptanalysis of the Random Number Generator of the Windows Operating System," TISSEC, vol. 13, no. 1, 2009.
[11]
S. Narayan, S. S. Kolahi, Y. Sunarto, D. D. T. Nguyen, and P. Mani, "Performance Comparison of IPv4 and IPv6 on Various Windows Operating Systems," in Proc ICCIT '08, 2008, pp. 663--668.
[12]
E. Cota-Robles and J. P. Held, "A Comparison of Windows Driver Model Latency Performance on Windows NT and Windows 98," in Proc SOSP '99, 1999, pp. 159--172.
[13]
S. Zhang, L. Wang, R. Zhang, and Q. Guo, "Exploratory Study on Memory Analysis of Windows 7 Operating System," in Proc ICACTE '10, 2010.
[14]
C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy, "Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista," in Proc ICSE '09, 2009, pp. 518--528.
[15]
J. Erickson, M. Musuvathi, S. Burckhardt, and K. Olynyk, "Effective Data-Race Detection for the Kernel," in Proc OSDI '10, 2010, pp. 151--162.
[16]
S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie, "Performance Debugging in the Large via Mining Millions of Stack Traces," in Proc ICSE '12, 2012, pp. 145--155.
[17]
P. L. Li, R. Kivett, Z. Zhan, S. Jeon, N. Nagappan, B. Murphy, and A. J. Ko, "Characterizing the differences between pre- and post-release versions of software," Proc ICSE '11, pp. 716--725, 2011.
[18]
A. Deng, Y. Xu, R. Kohavi, and T. Walker, "Improving the sensitivity of online controlled experiments by utilizing pre-experiment data," in Proc WSDM '13, 2013, p. 123.
[19]
B. Ding, H. Nori, P. Li, and J. Allen, "Comparing Population Means under Local Differential Privacy: with Significance and Power," in Proc AAAI '18, 2018.
[20]
P. Dmitriev and X. Wu, "Measuring Metrics," Proc CIKM '16, pp. 429--437, 2016.
[21]
P. Dmitriev, S. Gupta, K. Dong Woo, and G. Vaz, "A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments," Proc KDD '17, pp. 1427--1436, 2017.
[22]
H. Hohnhold, D. O'Brien, and D. Tang, "Focusing on the Long-term," in Proc KDD '15, 2015, pp. 1849--1858.
[23]
P. Dmitriev, B. Frasca, S. Gupta, R. Kohavi, and G. Vaz, "Pitfalls of long-term online controlled experiments," in Proc Big Data '16, 2016, pp. 1367--1376.
[24]
R. Kohavi and S. Thomke, "The Surprising Power of Online Experiments," Harv. Bus. Rev., vol. 95, no. 5, p. 74, 2017.
[25]
A. Fabijan, P. Dmitriev, H. H. Olsson, and J. Bosch, "The Benefits of Controlled Experimentation at Scale," in Proc SEAA'17, 2017, pp. 18--26.
[26]
Y. Xu, N. Chen, A. Fernandez, O. Sinno, and A. Bhasin, "From Infrastructure to Culture: A/B Testing Challenges in Large Scale Social Networks," in Proc KDD'15, 2015, pp. 2227--2236.
[27]
S. Gupta, L. Ulanova, S. Bhardwaj, P. Dmitriev, P. Raff, and A. Fabijan, "The Anatomy of a Large-Scale Online Experimentation Platform," in Proc ICSA '18, 2018.
[28]
R. Kohavi, A. Deng, R. Longbotham, and Y. Xu, "Seven Rules of Thumb for Web Site Experimenters," in Proc KDD '14, 2014, pp. 1857--1866.
[29]
R. Kohavi and R. Longbotham, "Online Experiments: Lessons Learned," Computer (Long. Beach. Calif)., vol. 40, no. 9, pp. 103--105, 2007.
[30]
P. Runeson and M. Höst, "Guidelines for conducting and reporting case study research in software engineering," Empir. Softw. Eng., vol. 14, no. 2, pp. 131--164, 2009.
[31]
P. L. Li, M. Ni, S. Xue, J. P. Mullally, M. Garzia, and M. Khambatti, "Reliability assessment of mass-market software: insights from Windows Vista®," Proc ISSRE '08, pp. 265--270, Nov. 2008.
[32]
T. Myerson, "An Update on What's Coming Next for Windows Insiders," Windows Insider, 2017. {Online}. Available: https://insider.windows.com/en-us/articles/update-whats-coming-next-windows-insiders/.
[33]
R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, and N. Pohlmann, "Online controlled experiments at large scale," Proc KDD '13, p. 1168, 2013.
[34]
P. Bright, "Windows 7, 8.1 Moving to Windows 10's Cumulative Update Model," arstechnia.com, 2016. {Online}. Available: https://arstechnica.com/information-technology/2016/08/windows-7-8-1-moving-to-windows-10s-cumulative-update-model/.
[35]
S. Radhakrishnan, Y. Cheng, J. Chu, A. Jain, and B. Raghavan, "TCP Fast Open," in Proc CoNEXT '11, 2011.
[36]
D. Halfin and B. Lich, "Build Deployment Rings for Windows 10 Updates," Windows Docs, 2017. {Online}. Available: https://docs.microsoft.com/en-us/windows/deployment/update/waas-deployment-rings-windows-10-updates.
[37]
E. Bott, "Windows 10 Telemetry Secrets: Where, When, and Why Microsoft Collects Your Data," ZDNet, 2016. {Online}. Available: https://www.zdnet.com/article/windows-10-telemetry-secrets/.
[38]
A. Deng and X. Shi, "Data-Driven Metric Development for Online Controlled Experiments," in Proc KDD '16, 2016, pp. 77--86.
[39]
S. Sawaya, "How Windows Insider Feedback Influences Windows 10 Development," Windows Blogs, 2015. {Online}. Available: https://blogs.windows.com/windowsexperience/2015/06/12/how-windows-insider-feedback-influences-windows-10-development/.
[40]
K. Kniskern, "Looks like Microsoft is testing new icons for Edge Hub," OnMSFT, 2017. {Online}. Available: https://www.onmsft.com/news/looks-like-microsoft-is-testing-new-icons-for-edge-feedback-hub.
[41]
J. Gupchup, Y. Hosseinkashi, P. Dmitriev, D. Schneider, R. Cutler, A. Jefremov, and M. Ellis, "Trustworthy Experimentation Under Telemetry Loss," in Proc CIKM'18, 2018, pp. 387--396.

Cited By

View all
  • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2023)A/B Integrations: 7 Lessons Learned from Enabling A/B Testing as a Product FeatureProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00033(304-314)Online publication date: 17-May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE-SEIP '19: Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice
May 2019
339 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 27 May 2019

Check for updates

Author Tags

  1. a/b testing
  2. online controlled experiments
  3. operating systems

Qualifiers

  • Research-article

Conference

ICSE '19
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 02 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2023)A/B Integrations: 7 Lessons Learned from Enabling A/B Testing as a Product FeatureProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00033(304-314)Online publication date: 17-May-2023
  • (2021)How to Measure Your App: A Couple of Pitfalls and Remedies in Measuring App Performance in Online Controlled ExperimentsProceedings of the 14th ACM International Conference on Web Search and Data Mining10.1145/3437963.3441742(949-957)Online publication date: 8-Mar-2021
  • (2021)Evolving software to be ML-driven utilizing real-world A/B testingProceedings of the 43rd International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP52600.2021.00026(170-179)Online publication date: 25-May-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media