[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3580305.3599818acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring

Published: 04 August 2023 Publication History

Abstract

With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible. Therefore, there is an increasing demand for an efficient continuous monitoring service that allows early stopping when appropriate. Classic statistical methods focus on hypothesis testing and are mostly developed for traditional high-stake problems such as clinical trials, while experiments at online service companies typically have very different features and focuses. Motivated by the real needs, in this paper, we introduce a novel framework that we developed in Amazon to maximize customer experience and control opportunity cost. We formulate the problem as a Bayesian optimal sequential decision making problem that has a unified utility function. We discuss extensively practical design choices and considerations. We further introduce how to solve the optimal decision rule via Reinforcement Learning and scale the solution. We show the effectiveness of this novel approach compared with existing methods via a large-scale meta-analysis on experiments in Amazon.

Supplementary Material

MP4 File (rtfp0123-2min-promo.mp4.mp4)
Promotion video

References

[1]
Charlie Casper, Thomas Cook and Oscar A. Perez. 2022-02. An R Package for Group Sequential Boundaries Using Alpha Spending Functions. https://cran.r- project.org/web/packages/ldbounds/index.html.
[2]
Linda Anderson. 2016. Library website visits and enrollment trends. Evidence Based Library and Information Practice 11, 1 (2016), 4--22.
[3]
OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39, 1 (2020), 3--20.
[4]
Bram Bakker. 2001. Reinforcement learning with long short-term memory. Advances in neural information processing systems 14 (2001).
[5]
Carolin Benjamins, Theresa Eimer, Frederik Schubert, André Biedenkapp, Bodo Rosenhahn, Frank Hutter, and Marius Lindauer. 2021. CARL: A benchmark for contextual and adaptive reinforcement learning. arXiv preprint arXiv:2110.02102 (2021).
[6]
James O Berger. 2013. Statistical decision theory and Bayesian analysis. Springer Science & Business Media.
[7]
Kenneth P Burnham and Walter Scott Overton. 1978. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65, 3 (1978), 625--633.
[8]
George Casella and Roger L Berger. 2021. Statistical inference. Cengage Learning.
[9]
Kathryn Chaloner and Isabella Verdinelli. 1995. Bayesian experimental design: A review. Statistical science (1995), 273--304.
[10]
David L Demets and KK Gordon Lan. 1994. Interim analysis: the alpha spending function approach. Statistics in medicine 13, 13--14 (1994), 1341--1352.
[11]
Alex Deng, Jiannan Lu, and Shouyuan Chen. 2016. Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing. In 2016 IEEE international conference on data science and advanced analytics (DSAA). IEEE, 243--252.
[12]
Theresa Eimer, André Biedenkapp, Frank Hutter, and Marius Lindauer. [n. d.]. Towards Self-Paced Context Evaluation for Contextual Reinforcement Learning. ([n. d.]).
[13]
Adam Foster, Desi R Ivanova, Ilyas Malik, and Tom Rainforth. 2021. Deep adaptive design: Amortizing sequential bayesian experimental design. In International Conference on Machine Learning. PMLR, 3384--3395.
[14]
Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. 1995. Bayesian data analysis. Chapman and Hall/CRC.
[15]
KK Gordon Lan and David L DeMets. 1983. Discrete sequential boundaries for clinical trials. Biometrika 70, 3 (1983), 659--663.
[16]
Somit Gupta, Ronny Kohavi, Diane Tang, Ya Xu, Reid Andersen, Eytan Bakshy, Niall Cardin, Sumita Chandran, Nanyu Chen, Dominic Coey, et al. 2019. Top challenges from the first practical online controlled experiments summit. ACM SIGKDD Explorations Newsletter 21, 1 (2019), 20--35.
[17]
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.
[18]
Assaf Hallak, Dotan Di Castro, and Shie Mannor. 2015. Contextual markov decision processes. arXiv preprint arXiv:1502.02259 (2015).
[19]
Xun Huan and Youssef M Marzouk. 2016. Sequential Bayesian optimal experimental design via approximate dynamic programming. arXiv preprint arXiv:1604.08320 (2016).
[20]
Desi R Ivanova, Adam Foster, Steven Kleinegesse, Michael U Gutmann, and Thomas Rainforth. 2021. Implicit deep adaptive design: policy-based experimental design without likelihoods. Advances in Neural Information Processing Systems 34 (2021), 25785--25798.
[21]
Harold Jeffreys. 1961. The theory of probability. Oxford University Press.
[22]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1517--1525.
[23]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2022. Always valid inference: Continuous monitoring of a/b tests. Operations Research 70, 3 (2022), 1806--1821.
[24]
Ramesh Johari, Leo Pekelis, and David J Walsh. 2015. Always valid inference: Bringing sequential analysis to A/B testing. arXiv preprint arXiv:1512.04922 (2015).
[25]
Prashant Kadam and Supriya Bhalerao. 2010. Sample size calculation. International journal of Ayurveda research 1, 1 (2010), 55.
[26]
Madan Gopal Kundu, Sandipan Samanta, and Shoubhik Mondal. 2021. Conditional power, predictive power and probability of success in clinical trials with continuous, binary and time-to-event endpoints. (2021).
[27]
Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Gold- berg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning. PMLR, 3053--3062.
[28]
Winston Lin. 2013. Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique. (2013).
[29]
James K Lindsey et al. 1999. Models for repeated measurements. OUP Catalogue (1999).
[30]
Johannes S Maritz. 2018. Empirical bayes methods. Chapman and Hall/CRC.
[31]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.
[32]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al . 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
[33]
Peter Müller, Don A Berry, Andy P Grieve, Michael Smith, and Michael Krams. 2007. Simulation-based sequential Bayesian design. Journal of statistical planning and inference 137, 10 (2007), 3140--3150.
[34]
Susan A Murphy, Mark J van der Laan, James M Robins, and Conduct Problems Prevention Research Group. 2001. Marginal mean models for dynamic regimes. J. Amer. Statist. Assoc. 96, 456 (2001), 1410--1423.
[35]
Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
[36]
Thomas S Richardson, Yu Liu, James McQueen, and Doug Hains. 2022. A Bayesian Model for Online Activity Sample Sizes. In International Conference on Artificial Intelligence and Statistics. PMLR, 1775--1785.
[37]
Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017, 19 (2017), 70--76.
[38]
Felix D Schönbrodt, Eric-Jan Wagenmakers, Michael Zehetleitner, and Marco Perugini. 2017. Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological methods 22, 2 (2017), 322.
[39]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897.
[40]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[41]
Wanggang Shen and Xun Huan. 2021. Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning. arXiv preprint arXiv:2110.15335 (2021).
[42]
Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-Task Reinforcement Learning with Context-based Representations. arXiv preprint arXiv:2102.06177 (2021).
[43]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An intro- duction. MIT press.
[44]
Mauricio Tec, Yunshan Duan, and Peter Müller. 2022. A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning. arXiv preprint arXiv:2205.04023 (2022).
[45]
Raphael Vallat. 2018. Pingouin: statistics in Python. J. Open Source Softw. 3, 31 (2018), 1026.
[46]
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.
[47]
Nikos Vlassis, Mohammad Ghavamzadeh, Shie Mannor, and Pascal Poupart. 2012. Bayesian reinforcement learning. Reinforcement learning (2012), 359--386.
[48]
Abraham Wald and Jacob Wolfowitz. 1948. Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics (1948), 326--339.
[49]
Runzhe Wan, Xinyu Zhang, and Rui Song. 2021. Multi-objective model-based reinforcement learning for infectious disease control. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1634--1644.
[50]
Bernard L Welch. 1947. The generalization of ?STUDENT'S'problem when several different population varlances are involved. Biometrika 34, 1--2 (1947), 28--35

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. a/b testing
  2. reinforcement learning
  3. sequential decision making

Qualifiers

  • Research-article

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 504
    Total Downloads
  • Downloads (Last 12 months)293
  • Downloads (Last 6 weeks)32
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media