[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3314183.3323853acmconferencesArticle/Chapter ViewAbstractPublication PagesumapConference Proceedingsconference-collections
short-paper

Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization

Published: 06 June 2019 Publication History

Abstract

Online controlled experiments (also called A/B tests, bucket testing or randomized experiments) have become an habitual practice in numerous companies for measuring the impact of new features and changes deployed to softwares products. In theory, these experiments are one of the simplest methods to evaluate the potential effects that new features have on user's behavior. In practice, however, there are many pitfalls that can obscure the interpretation of results or induce invalid conclusions. There is, in the literature, no shortage of prior work on online controlled experiments addressing these pitfalls and conclusions misinterpretations, but the topic is not tackled considering the specific case of testing personalization features. In this paper, we present some of the experimentation pitfalls that are particularly important for personalization features. To better illustrate each pitfall, we include a combination of theoretical argumentation as well as examples from real company's experiments. While there is clearly value in evaluating personalized features by means of online controlled experiments, there are some pitfalls to bear in mind while testing. With this paper, we aim to increase the experimenters' awareness of leading to improved quality and reliability of the results.

References

[1]
Xavier Amatriain and Justin Basilico. 2012. Netflix Recommendations: Beyond the 5 stars (Part 2) . (2012), bibinfonumpagesPersonalization Science and Engineering pages. https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-2-d9b96aa399f5
[2]
Neeraj Arora, Xavier Dreze, Anindya Ghose, James D. Hess, Raghuram Iyengar, Bing Jing, Yogesh Joshi, V. Kumar, Nicholas Lurie, Scott Neslin, S. Sajeesh, Meng Su, Niladri Syam, Jacquelyn Thomas, Z. John Zhang, and Et al. 2008. Putting one-to-one marketing to work: Personalization, customization, and choice . Marketing Letters, Vol. 19, 3--4 (2008), 305--321. arxiv: arXiv:1011.1669v3
[3]
Eytan Bakshy, Dean Eckles, and Michael S. Bernstein. 2014. Designing and Deploying Online Field Experiments . International World Wide Web Conference Committe (IW3C2) 1 (2014).
[4]
Alexander Bleier and Maik Eisenbeiss. 2015. Personalized Online Advertising Effectiveness: The Interplay of What, When, and Where . Marketing Science, Vol. 34, 5 (2015), 669--688.
[5]
C Cockburn and T D Wilson. 1996. Business Use of the World-Wide Web . International Journal of Information Management, Vol. 16, 2 (1996), 83--102.
[6]
Thomas Crook, Brian Frasca, Ron Kohavi, and Roger Longbotham. 2009. Seven pitfalls to avoid when running controlled experiments on the web . Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09 (2009), 1105.
[7]
Jonas Dahl and Doug Mumford. 2015. Nine Common A / B Testing Pitfalls and How to Avoid Them . (2015), bibinfonumpages7 pages.
[8]
Ariyam Das and Harish Ranganath. 2013. When web personalization misleads bucket testing . UEO (2013), 17--20.
[9]
Anirban Dasgupta, Maxim Gurevich, Liang Zhang, Belle Tseng, and Achint O. Thomas. 2012. Overcoming browser cookie churn with clustering . Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12 (2012), 83.
[10]
Anirban Deb, Suman Bhattacharya, Jeremy Gu, Tianxia Zhou, Eva Feng, and Mandie Liu. 2018. Under the Hood of Uber's Experimentation Platform . (2018). https://eng.uber.com/xp/
[11]
Alex Deng, Jiannan Lu, and Shouyuan Chen. 2016. Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing . Proceedings - 3rd IEEE International Conference on Data Science and Advanced Analytics, DSAA 2016 2 (2016), 243--252. arxiv: 1602.05549
[12]
Pavel Dmitriev, Brian Frasca, Somit Gupta, Ron Kohavi, and Garnet Vaz. 2016. Pitfalls of long-term online controlled experiments. In 2016 IEEE International Conference on Big Data (Big Data). Washington, United States, 1367--1376.
[13]
Pavel Dmitriev, Somit Gupta, Kim Dong Woo, and Garnet Vaz. 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments . Proceedings of the 23rd ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '17 (2017), 1427--1436.
[14]
Emily Robinson. 2018. Guidelines for A/B Testing . (2018). https://hookedondata.org/guidelines-for-ab-testing/
[15]
Aleksander Fabijan, Pavel Dmitriev, Helena Holmströ m Olsson, and Jan Bosch. 2018. Online Controlled Experimentation at Scale: An Empirical Survey on the Current State of A/B Testing. In 44th Euromicron Conference on Software Engineering and Advanced Applications, Vol. 7. 57--61.
[16]
Nirmal Govind. 2017. A/B Testing and Beyond: Improving the Netflix Streaming Experience with Experimentation and Data Science . (2017).
[17]
Henning Hohnhold, Deirdre O'Brien, and Diane Tang. 2015. Focusing on the Long-term: It's Good for Users and Business. In Proceedings 21st Conference on Knowledge Discovery and Data Mining, ACM, Sydney, Australia (2015), Vol. 2015. Sydney.
[18]
Inc.) International, Researchscape (Evergage. 2018. 2018 Trends in Personalization. Technical Report.
[19]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at A/B Tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '17 . 1517--1525.
[20]
Maurits Kaptein, Panos Markopoulos, Boris De Ruyter, and Emile Aarts. 2015. Personalizing persuasive technologies: Explicit and implicit personalization using persuasion profiles . International Journal of Human Computer Studies, Vol. 77 (2015), 38--51.
[21]
Claire Vignon Keser. 2018. The top 3 mistakes that make your A/B test results invalid . (2018). https://www.widerfunnel.com/3-mistakes-invalidate-ab-test-results/
[22]
Bart P. Knijnenburg. 2012. Conducting user experiments in recommender systems. In Proceedings of the sixth ACM conference on Recommender systems - RecSys '12. 3.
[23]
Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu. 2012. Trustworthy online controlled experiments: Five Puzzling Outcomes Explained . Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '12 (2012), 786. arxiv: arXiv:1503.08776v1
[24]
Ron Kohavi, Alex Deng, Roger Longbotham, and Ya Xu. 2014. Seven rules of thumb for web site experimenters . Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14 (2014), 1857--1866.
[25]
Ron (Microsoft) Kohavi, Randal M (Microsoft) Henne, and Dan (Microsoft) Sommerfield. 2007. Practical Guide to Controlled Experiments on the Web : Listen to Your Customers not to the HiPPO. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, CA; United States; 12 August 2007 through 15 August 2007; Code 70699 KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; San Jose (Ed.). San Jose, CA; United States, 959--967.
[26]
Pete Koomen. A Practical Guide to Statistics for Online Experiments . ( ????).
[27]
Kwiseok Kwon, Jinhyung Cho, and Yongtae Park. 2010. How to best characterize the personalization construct for e-services . Expert Systems with Applications, Vol. 37, 3 (2010), 2232--2240.
[28]
Benjamin Letham, Brian Karrer, Guilherme Ottoni, and Eytan Bakshy. 2018. Constrained Bayesian Optimization with Noisy Experiments . Bayesian Anal. Advance publication (2018), 1--25. arxiv: 1706.07094
[29]
Cong Li and J Liu. 2017. A name alone is not enough: A reexamination of web-based personalization effect . Computers in Human Behavior, Vol. 72 (2017), 132--139.
[30]
Andrew Lipsman. 2007. Cookie-Based Counting Overstates Size of Web Site Audiences . (2007). https://www.comscore.com/Insights/Press-Releases/2007/04/comScore-Cookie-Deletion-Report
[31]
Evan Miller. 2010. How not to run an A/B test . (2010). https://www.evanmiller.org/how-not-to-run-an-ab-test.html
[32]
Ville Salonen and H Karjaluoto. 2016. Web personalization: The state of the art and future avenues for research and practice . Telematics and Informatics, Vol. 33, 4 (2016), 1088--1104.
[33]
Chris Stucchio. 2010. Bayesian A/B Testing at VWO . (2010), bibinfonumpages2010--2013 pages.
[34]
Gang Su and Ian Yohai. 2019. Improving Experimentation Efficiency at Netflix with Meta Analysis and Optimal Stopping . (2019).
[35]
Steve Urban, Rangarajan Sreenivasan, and Vineet Kannan. It's All A/Bout Testing: The Netflix Experimentation Platform . ( ????). https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-platform-4e1ca458c15
[36]
Hubert Wassner and Anthony Brebion. Demystifying A / B Testing Statistics . ( ????).
[37]
Y Zhao and D Zhao. 2016. The personalization willingness paradox: An empirical evaluation of sharing information and prospective benefit of online consumers . RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, Vol. 2016, E11 (2016), 30--44. https://www.scopus.com/inward/record.uri?eid=2-s2.0--85011112276

Cited By

View all
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2023)The Technologies Used for Artwork Personalization and the ChallengesProceedings of the 2022 3rd International Conference on Big Data Economy and Information Management (BDEIM 2022)10.2991/978-94-6463-124-1_28(230-238)Online publication date: 29-Mar-2023
  • (2020)Evaluating Personalization: The AB Testing Pitfalls Companies Might Not Be Aware of—A Spotlight on the Automotive Sector WebsitesFrontiers in Artificial Intelligence10.3389/frai.2020.000203Online publication date: 9-Apr-2020

Index Terms

  1. Experimentation Pitfalls to Avoid in A/B Testing for Online Personalization

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UMAP'19 Adjunct: Adjunct Publication of the 27th Conference on User Modeling, Adaptation and Personalization
    June 2019
    455 pages
    ISBN:9781450367110
    DOI:10.1145/3314183
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. a/b testing
    2. controlled experiments
    3. online experiments
    4. online personalization
    5. personalization

    Qualifiers

    • Short-paper

    Funding Sources

    • Agència de Gestió dðAjuts Universitaris i de Recerca

    Conference

    UMAP '19
    Sponsor:

    Acceptance Rates

    UMAP'19 Adjunct Paper Acceptance Rate 30 of 122 submissions, 25%;
    Overall Acceptance Rate 162 of 633 submissions, 26%

    Upcoming Conference

    UMAP '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 20 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
    • (2023)The Technologies Used for Artwork Personalization and the ChallengesProceedings of the 2022 3rd International Conference on Big Data Economy and Information Management (BDEIM 2022)10.2991/978-94-6463-124-1_28(230-238)Online publication date: 29-Mar-2023
    • (2020)Evaluating Personalization: The AB Testing Pitfalls Companies Might Not Be Aware of—A Spotlight on the Automotive Sector WebsitesFrontiers in Artificial Intelligence10.3389/frai.2020.000203Online publication date: 9-Apr-2020

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media