Abstract
Accurate prioritization of efforts in product and services development is critical to the success of every company. Online controlled experiments, also known as A/B tests, enable software companies to establish causal relationships between changes in their systems and the movements in the metrics. By experimenting, product development can be directed towards identifying and delivering value. Previous research stresses the need for data-driven development and experimentation. However, the level of granularity in which existing models explain the experimentation process is neither sufficient, in terms of details, nor scalable, in terms of how to increase number and run different types of experiments, in an online setting. Based on a case study of multiple products running online controlled experiments at Microsoft, we provide an experimentation framework composed of two detailed experimentation models focused on two main aspects; the experimentation activities and the experimentation metrics. This work intends to provide guidelines to companies and practitioners on how to set and organize experimentation activities for running trustworthy online controlled experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Fabijan, A., Dmitriev, P., Olsson, H.H., Bosch, J.: The benefits of controlled experimentation at scale. In: Proceedings of the 43rd Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2017, pp. 18–26 (2017)
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., Xu, Y.: Trustworthy online controlled experiments. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2012, p. 786 (2012)
Bakshy, E., Eckles, D., Bernstein, M.S.: Designing and deploying online field experiments. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 283–292, September 2014
Gui, H., Xu, Y., Bhasin, A., Han, J.: Network A/B testing. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, pp. 399–409 (2015)
Tang, D., Agarwal, A., O’Brien, D., Meyer, M.: Overlapping experiment infrastructure. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2010, p. 17 (2010)
Dmitriev, P., Frasca, B., Gupta, S., Kohavi, R., Vaz, G.: Pitfalls of long-term online controlled experiments. In: Proceedings of the 2016 IEEE International Conference on Big Data, Big Data 2016, pp. 1367–1376 (2016)
Fagerholm, F., Sanchez Guinea, A., Mäenpää, H., Münch, J.: The RIGHT model for continuous experimentation. J. Syst. Softw. 123, 292–305 (2017)
Olsson, H.H., Bosch, J.: The HYPEX model: from opinions to data-driven software development. In: Bosch, J. (ed.) Continuous Software Engineering, pp. 155–164. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11283-1_13
Olsson, H.H., Bosch, J.: Towards continuous customer validation: a conceptual model for combining qualitative customer feedback with quantitative customer observation. In: Fernandes, J.M., Machado, R.J., Wnuk, K. (eds.) ICSOB 2015. LNBIP, vol. 210, pp. 154–166. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19593-3_13
Dmitriev, P., Gupta, S., Dong Woo, K., Vaz, G.: A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1427–1436 (2017)
Crook, T., Frasca, B., Kohavi, R., Longbotham, R.: Seven pitfalls to avoid when running controlled experiments on the web. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, p. 1105 (2009)
Kluck, T., Vermeer, L.: Leaky abstraction in online experimentation platforms: a conceptual framework to categorize common challenges (2017)
Chen, R., Chen, M., Jadav, M.R., Bae, J., Matheson, D.: Faster online experimentation by eliminating traditional A/A validation, pp. 1635–1641 (2017)
Kaufman, R.L., Pitchforth, J., Vermeer, L.: Democratizing online controlled experiments at Booking.com. http://arxiv.org/abs/1710.08217. Accessed 23 Oct 2017
Kohavi, R., Longbotham, R., Sommerfield, D., Henne, R.M.: Controlled experiments on the web: survey and practical guide. Data Min. Knowl. Discov. 18(1), 140–181 (2009)
Fabijan, A., Olsson, H.H., Bosch, J.: Customer feedback and data collection techniques in software R&D: a literature review. In: Fernandes, J.M., Machado, R.J., Wnuk, K. (eds.) ICSOB 2015. LNBIP, vol. 210, pp. 139–153. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19593-3_12
Kohavi, R., Thomke, S.: The surprising power of online experiments. Harv. Bus. Rev. 95, 74 (2017)
Gupta, S., Bhardwaj, S., Dmitriev, P., Ulanova, L., Raff, P., Fabijan, A.: The anatomy of a large-scale online experimentation platform. In: International Conference on Software Architecture, ICSA 2018, May 2018
Kevic, K., Murphy, B., Williams, L., Beckmann, J.: Characterizing experimentation in continuous deployment: a case study on bing. In: Proceedings of the 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track, ICSE-SEIP 2017, pp. 123–132 (2017)
Fabijan, A., Dmitriev, P., Olsson, H.H., Bosch, J.: The evolution of continuous experimentation in software product development. In: Proceedings of the 39th International Conference on Software Engineering, ICSE 2017 (2017)
Dmitriev, P., Wu, X.: Measuring metrics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, pp. 429–437 (2016)
Deng, A., Shi, X.: Data-driven metric development for online controlled experiments. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 77–86 (2016)
Ries, E.: The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses (2011)
Runeson, P., Höst, M.: Guidelines for conducting and reporting case study research in software engineering. Empir. Softw. Eng. 14(2), 131–164 (2009)
Robson, C., McCartan, K.: Real World Research, 4th edn. John Wiley & Sons Ltd., New York (2016)
Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., Pohlmann, N.: Online controlled experiments at large scale. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, p. 1168 (2013)
Deng, A., Lu, J., Litz, J.: Trustworthy analysis of online A/B tests. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM 2017, pp. 641–649 (2017)
Bottou, L., et al.: Counterfactual reasoning and learning systems. J. Mach. Learn. Res. 14, 3207–3260 (2013)
Kohavi, R., Deng, A., Longbotham, R., Xu, Y.: Seven rules of thumb for web site experimenters. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, pp. 1857–1866 (2014)
Acknowledgments
This work was partially supported by the Wallenberg Artificial Intelligence, Autonomous Systems and Software Program (WASP), funded by the Knut and Alice Wallenberg Foundation. The authors would like to thank Microsoft’s Analysis and Experimentation team for the opportunity to conduct this study with them.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Issa Mattos, D., Dmitriev, P., Fabijan, A., Bosch, J., Holmström Olsson, H. (2018). An Activity and Metric Model for Online Controlled Experiments. In: Kuhrmann, M., et al. Product-Focused Software Process Improvement. PROFES 2018. Lecture Notes in Computer Science(), vol 11271. Springer, Cham. https://doi.org/10.1007/978-3-030-03673-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-03673-7_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03672-0
Online ISBN: 978-3-030-03673-7
eBook Packages: Computer ScienceComputer Science (R0)