More Web Proxy on the site http://driver.im/

short-paper

Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization

Authors:

Maria Esteller-Cucala,

Vicenc Fernandez,

Diego VilluendasAuthors Info & Claims

UMAP'19 Adjunct: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization

Pages 153 - 159

https://doi.org/10.1145/3314183.3323853

Published: 06 June 2019 Publication History

Abstract

Online controlled experiments (also called A/B tests, bucket testing or randomized experiments) have become an habitual practice in numerous companies for measuring the impact of new features and changes deployed to softwares products. In theory, these experiments are one of the simplest methods to evaluate the potential effects that new features have on user's behavior. In practice, however, there are many pitfalls that can obscure the interpretation of results or induce invalid conclusions. There is, in the literature, no shortage of prior work on online controlled experiments addressing these pitfalls and conclusions misinterpretations, but the topic is not tackled considering the specific case of testing personalization features. In this paper, we present some of the experimentation pitfalls that are particularly important for personalization features. To better illustrate each pitfall, we include a combination of theoretical argumentation as well as examples from real company's experiments. While there is clearly value in evaluating personalized features by means of online controlled experiments, there are some pitfalls to bear in mind while testing. With this paper, we aim to increase the experimenters' awareness of leading to improved quality and reliability of the results.

References

[1]

Xavier Amatriain and Justin Basilico. 2012. Netflix Recommendations: Beyond the 5 stars (Part 2) . (2012), bibinfonumpagesPersonalization Science and Engineering pages. https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-2-d9b96aa399f5

[2]

Neeraj Arora, Xavier Dreze, Anindya Ghose, James D. Hess, Raghuram Iyengar, Bing Jing, Yogesh Joshi, V. Kumar, Nicholas Lurie, Scott Neslin, S. Sajeesh, Meng Su, Niladri Syam, Jacquelyn Thomas, Z. John Zhang, and Et al. 2008. Putting one-to-one marketing to work: Personalization, customization, and choice . Marketing Letters, Vol. 19, 3--4 (2008), 305--321. arxiv: arXiv:1011.1669v3

[3]

Eytan Bakshy, Dean Eckles, and Michael S. Bernstein. 2014. Designing and Deploying Online Field Experiments . International World Wide Web Conference Committe (IW3C2) 1 (2014).

Digital Library

[4]

Alexander Bleier and Maik Eisenbeiss. 2015. Personalized Online Advertising Effectiveness: The Interplay of What, When, and Where . Marketing Science, Vol. 34, 5 (2015), 669--688.

Digital Library

[5]

C Cockburn and T D Wilson. 1996. Business Use of the World-Wide Web . International Journal of Information Management, Vol. 16, 2 (1996), 83--102.

Digital Library

[6]

Thomas Crook, Brian Frasca, Ron Kohavi, and Roger Longbotham. 2009. Seven pitfalls to avoid when running controlled experiments on the web . Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09 (2009), 1105.

Digital Library

[7]

Jonas Dahl and Doug Mumford. 2015. Nine Common A / B Testing Pitfalls and How to Avoid Them . (2015), bibinfonumpages7 pages.

[8]

Ariyam Das and Harish Ranganath. 2013. When web personalization misleads bucket testing . UEO (2013), 17--20.

Digital Library

[9]

Anirban Dasgupta, Maxim Gurevich, Liang Zhang, Belle Tseng, and Achint O. Thomas. 2012. Overcoming browser cookie churn with clustering . Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12 (2012), 83.

Digital Library

[10]

Anirban Deb, Suman Bhattacharya, Jeremy Gu, Tianxia Zhou, Eva Feng, and Mandie Liu. 2018. Under the Hood of Uber's Experimentation Platform . (2018). https://eng.uber.com/xp/

[11]

Alex Deng, Jiannan Lu, and Shouyuan Chen. 2016. Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing . Proceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016 2 (2016), 243--252. arxiv: 1602.05549

[12]

Pavel Dmitriev, Brian Frasca, Somit Gupta, Ron Kohavi, and Garnet Vaz. 2016. Pitfalls of long-term online controlled experiments. In 2016 IEEE International Conference on Big Data (Big Data). Washington, United States, 1367--1376.

[13]

Pavel Dmitriev, Somit Gupta, Kim Dong Woo, and Garnet Vaz. 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments . Proceedings of the 23rd ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '17 (2017), 1427--1436.

Digital Library

[14]

Emily Robinson. 2018. Guidelines for A/B Testing . (2018). https://hookedondata.org/guidelines-for-ab-testing/

[15]

Aleksander Fabijan, Pavel Dmitriev, Helena Holmströ m Olsson, and Jan Bosch. 2018. Online Controlled Experimentation at Scale: An Empirical Survey on the Current State of A/B Testing. In 44th Euromicron Conference on Software Engineering and Advanced Applications, Vol. 7. 57--61.

[16]

Nirmal Govind. 2017. A/B Testing and Beyond: Improving the Netflix Streaming Experience with Experimentation and Data Science . (2017).

[17]

Henning Hohnhold, Deirdre O'Brien, and Diane Tang. 2015. Focusing on the Long-term: It's Good for Users and Business. In Proceedings 21st Conference on Knowledge Discovery and Data Mining, ACM, Sydney, Australia (2015), Vol. 2015. Sydney.

Digital Library

[18]

Inc.) International, Researchscape (Evergage. 2018. 2018 Trends in Personalization. Technical Report.

[19]

Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at A/B Tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17 . 1517--1525.

Digital Library

[20]

Maurits Kaptein, Panos Markopoulos, Boris De Ruyter, and Emile Aarts. 2015. Personalizing persuasive technologies: Explicit and implicit personalization using persuasion profiles . International Journal of Human Computer Studies, Vol. 77 (2015), 38--51.

Digital Library

[21]

Claire Vignon Keser. 2018. The top 3 mistakes that make your A/B test results invalid . (2018). https://www.widerfunnel.com/3-mistakes-invalidate-ab-test-results/

[22]

Bart P. Knijnenburg. 2012. Conducting user experiments in recommender systems. In Proceedings of the sixth ACM conference on Recommender systems - RecSys '12. 3.

Digital Library

[23]

Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu. 2012. Trustworthy online controlled experiments: Five Puzzling Outcomes Explained . Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12 (2012), 786. arxiv: arXiv:1503.08776v1

Digital Library

[24]

Ron Kohavi, Alex Deng, Roger Longbotham, and Ya Xu. 2014. Seven rules of thumb for web site experimenters . Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14 (2014), 1857--1866.

Digital Library

[25]

Ron (Microsoft) Kohavi, Randal M (Microsoft) Henne, and Dan (Microsoft) Sommerfield. 2007. Practical Guide to Controlled Experiments on the Web : Listen to Your Customers not to the HiPPO. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, CA; United States; 12 August 2007 through 15 August 2007; Code 70699 KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Jose (Ed.). San Jose, CA; United States, 959--967.

Digital Library

[26]

Pete Koomen. A Practical Guide to Statistics for Online Experiments . ( ????).

[27]

Kwiseok Kwon, Jinhyung Cho, and Yongtae Park. 2010. How to best characterize the personalization construct for e-services . Expert Systems with Applications, Vol. 37, 3 (2010), 2232--2240.

Digital Library

[28]

Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy. 2018. Constrained Bayesian Optimization with Noisy Experiments . Bayesian Anal. Advance publication (2018), 1--25. arxiv: 1706.07094

[29]

Cong Li and J Liu. 2017. A name alone is not enough: A reexamination of web-based personalization effect . Computers in Human Behavior, Vol. 72 (2017), 132--139.

Digital Library

[30]

Andrew Lipsman. 2007. Cookie-Based Counting Overstates Size of Web Site Audiences . (2007). https://www.comscore.com/Insights/Press-Releases/2007/04/comScore-Cookie-Deletion-Report

[31]

Evan Miller. 2010. How not to run an A/B test . (2010). https://www.evanmiller.org/how-not-to-run-an-ab-test.html

[32]

Ville Salonen and H Karjaluoto. 2016. Web personalization: The state of the art and future avenues for research and practice . Telematics and Informatics, Vol. 33, 4 (2016), 1088--1104.

Digital Library

[33]

Chris Stucchio. 2010. Bayesian A/B Testing at VWO . (2010), bibinfonumpages2010--2013 pages.

[34]

Gang Su and Ian Yohai. 2019. Improving Experimentation Efficiency at Netflix with Meta Analysis and Optimal Stopping . (2019).

[35]

Steve Urban, Rangarajan Sreenivasan, and Vineet Kannan. It's All A/Bout Testing: The Netflix Experimentation Platform . ( ????). https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15

[36]

Hubert Wassner and Anthony Brebion. Demystifying A / B Testing Statistics . ( ????).

[37]

Y Zhao and D Zhao. 2016. The personalization willingness paradox: An empirical evaluation of sharing information and prospective benefit of online consumers . RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, Vol. 2016, E11 (2016), 30--44. https://www.scopus.com/inward/record.uri?eid=2-s2.0--85011112276

Cited By

Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Guo Z(2023)The Technologies Used for Artwork Personalization and the ChallengesProceedings of the 2022 3rd International Conference on Big Data Economy and Information Management (BDEIM 2022)10.2991/978-94-6463-124-1_28(230-238)Online publication date: 29-Mar-2023
https://doi.org/10.2991/978-94-6463-124-1_28
Esteller-Cucala MFernandez VVilluendas D(2020)Evaluating Personalization: The AB Testing Pitfalls Companies Might Not Be Aware of—A Spotlight on the Automotive Sector WebsitesFrontiers in Artificial Intelligence10.3389/frai.2020.000203Online publication date: 9-Apr-2020
https://doi.org/10.3389/frai.2020.00020

Index Terms

Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods

Recommendations

A/B Testing Intuition Busters: Common Misunderstandings in Online Controlled Experiments
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

A/B tests, or online controlled experiments, are heavily used in industry to evaluate implementations of ideas. While the statistics behind controlled experiments are well documented and some basic pitfalls known, we have observed some seemingly ...
A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested in web sites, mobile applications, desktop ...
Online controlled experiments: introduction, learnings, and humbling statistics
RecSys '12: Proceedings of the sixth ACM conference on Recommender systems

The web provides an unprecedented opportunity to accelerate innovation by evaluating ideas quickly and accurately using controlled experiments (e.g., A/B tests and their generalizations). Whether for front-end user-interface changes, or backend ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

UMAP'19 Adjunct: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization

June 2019

455 pages

ISBN:9781450367110

DOI:10.1145/3314183

General Chairs:
George Angelos Papadopoulos
University of Cyprus, Cyprus
,
George Samaras
University of Cyprus, Cyprus
,
Stephan Weibelzahl
PFH Private University of Applied Sciences Göttingen, Germany
,
Program Chairs:
Dietmar Jannach
Alpen-Adria-Universität Klagenfurt, Austria
,
Olga C. Santos
aDeNu Research Group - UNED, Spain

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Agència de Gestió dðAjuts Universitaris i de Recerca

Conference

UMAP '19

Sponsor:

UMAP '19: 27th Conference on User Modeling, Adaptation and Personalization

June 9 - 12, 2019

Larnaca, Cyprus

Acceptance Rates

UMAP'19 Adjunct Paper Acceptance Rate 30 of 122 submissions, 25%;

Overall Acceptance Rate 162 of 633 submissions, 26%

Upcoming Conference

UMAP '25

Sponsor:
sigchi
sigchi

33rd ACM Conference on User Modeling, Adaptation and Personalization

June 16 - 19, 2025

New York City , NY , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
354
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)3

Reflects downloads up to 20 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Guo Z(2023)The Technologies Used for Artwork Personalization and the ChallengesProceedings of the 2022 3rd International Conference on Big Data Economy and Information Management (BDEIM 2022)10.2991/978-94-6463-124-1_28(230-238)Online publication date: 29-Mar-2023
https://doi.org/10.2991/978-94-6463-124-1_28
Esteller-Cucala MFernandez VVilluendas D(2020)Evaluating Personalization: The AB Testing Pitfalls Companies Might Not Be Aware of—A Spotlight on the Automotive Sector WebsitesFrontiers in Artificial Intelligence10.3389/frai.2020.000203Online publication date: 9-Apr-2020
https://doi.org/10.3389/frai.2020.00020

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents