More Web Proxy on the site http://driver.im/

research-article

Open access

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring

Authors:

Rui SongAuthors Info & Claims

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Pages 5016 - 5027

https://doi.org/10.1145/3580305.3599818

Published: 04 August 2023 Publication History

Abstract

With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible. Therefore, there is an increasing demand for an efficient continuous monitoring service that allows early stopping when appropriate. Classic statistical methods focus on hypothesis testing and are mostly developed for traditional high-stake problems such as clinical trials, while experiments at online service companies typically have very different features and focuses. Motivated by the real needs, in this paper, we introduce a novel framework that we developed in Amazon to maximize customer experience and control opportunity cost. We formulate the problem as a Bayesian optimal sequential decision making problem that has a unified utility function. We discuss extensively practical design choices and considerations. We further introduce how to solve the optimal decision rule via Reinforcement Learning and scale the solution. We show the effectiveness of this novel approach compared with existing methods via a large-scale meta-analysis on experiments in Amazon.

Supplementary Material

MP4 File (rtfp0123-2min-promo.mp4.mp4)

Promotion video

Download
21.92 MB

References

[1]

Charlie Casper, Thomas Cook and Oscar A. Perez. 2022-02. An R Package for Group Sequential Boundaries Using Alpha Spending Functions. https://cran.r- project.org/web/packages/ldbounds/index.html.

[2]

Linda Anderson. 2016. Library website visits and enrollment trends. Evidence Based Library and Information Practice 11, 1 (2016), 4--22.

[3]

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob McGrew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. 2020. Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39, 1 (2020), 3--20.

Digital Library

[4]

Bram Bakker. 2001. Reinforcement learning with long short-term memory. Advances in neural information processing systems 14 (2001).

[5]

Carolin Benjamins, Theresa Eimer, Frederik Schubert, André Biedenkapp, Bodo Rosenhahn, Frank Hutter, and Marius Lindauer. 2021. CARL: A benchmark for contextual and adaptive reinforcement learning. arXiv preprint arXiv:2110.02102 (2021).

[6]

James O Berger. 2013. Statistical decision theory and Bayesian analysis. Springer Science & Business Media.

[7]

Kenneth P Burnham and Walter Scott Overton. 1978. Estimation of the size of a closed population when capture probabilities vary among animals. Biometrika 65, 3 (1978), 625--633.

[8]

George Casella and Roger L Berger. 2021. Statistical inference. Cengage Learning.

[9]

Kathryn Chaloner and Isabella Verdinelli. 1995. Bayesian experimental design: A review. Statistical science (1995), 273--304.

[10]

David L Demets and KK Gordon Lan. 1994. Interim analysis: the alpha spending function approach. Statistics in medicine 13, 13--14 (1994), 1341--1352.

[11]

Alex Deng, Jiannan Lu, and Shouyuan Chen. 2016. Continuous monitoring of A/B tests without pain: Optional stopping in Bayesian testing. In 2016 IEEE international conference on data science and advanced analytics (DSAA). IEEE, 243--252.

[12]

Theresa Eimer, André Biedenkapp, Frank Hutter, and Marius Lindauer. [n. d.]. Towards Self-Paced Context Evaluation for Contextual Reinforcement Learning. ([n. d.]).

[13]

Adam Foster, Desi R Ivanova, Ilyas Malik, and Tom Rainforth. 2021. Deep adaptive design: Amortizing sequential bayesian experimental design. In International Conference on Machine Learning. PMLR, 3384--3395.

[14]

Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. 1995. Bayesian data analysis. Chapman and Hall/CRC.

[15]

KK Gordon Lan and David L DeMets. 1983. Discrete sequential boundaries for clinical trials. Biometrika 70, 3 (1983), 659--663.

[16]

Somit Gupta, Ronny Kohavi, Diane Tang, Ya Xu, Reid Andersen, Eytan Bakshy, Niall Cardin, Sumita Chandran, Nanyu Chen, Dominic Coey, et al. 2019. Top challenges from the first practical online controlled experiments summit. ACM SIGKDD Explorations Newsletter 21, 1 (2019), 20--35.

Digital Library

[17]

Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning. PMLR, 1861--1870.

[18]

Assaf Hallak, Dotan Di Castro, and Shie Mannor. 2015. Contextual markov decision processes. arXiv preprint arXiv:1502.02259 (2015).

[19]

Xun Huan and Youssef M Marzouk. 2016. Sequential Bayesian optimal experimental design via approximate dynamic programming. arXiv preprint arXiv:1604.08320 (2016).

[20]

Desi R Ivanova, Adam Foster, Steven Kleinegesse, Michael U Gutmann, and Thomas Rainforth. 2021. Implicit deep adaptive design: policy-based experimental design without likelihoods. Advances in Neural Information Processing Systems 34 (2021), 25785--25798.

[21]

Harold Jeffreys. 1961. The theory of probability. Oxford University Press.

[22]

Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at a/b tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1517--1525.

Digital Library

[23]

Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2022. Always valid inference: Continuous monitoring of a/b tests. Operations Research 70, 3 (2022), 1806--1821.

Digital Library

[24]

Ramesh Johari, Leo Pekelis, and David J Walsh. 2015. Always valid inference: Bringing sequential analysis to A/B testing. arXiv preprint arXiv:1512.04922 (2015).

[25]

Prashant Kadam and Supriya Bhalerao. 2010. Sample size calculation. International journal of Ayurveda research 1, 1 (2010), 55.

[26]

Madan Gopal Kundu, Sandipan Samanta, and Shoubhik Mondal. 2021. Conditional power, predictive power and probability of success in clinical trials with continuous, binary and time-to-event endpoints. (2021).

[27]

Eric Liang, Richard Liaw, Robert Nishihara, Philipp Moritz, Roy Fox, Ken Gold- berg, Joseph Gonzalez, Michael Jordan, and Ion Stoica. 2018. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning. PMLR, 3053--3062.

[28]

Winston Lin. 2013. Agnostic notes on regression adjustments to experimental data: Reexamining Freedman's critique. (2013).

[29]

James K Lindsey et al. 1999. Models for repeated measurements. OUP Catalogue (1999).

[30]

Johannes S Maritz. 2018. Empirical bayes methods. Chapman and Hall/CRC.

[31]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.

[32]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al . 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.

[33]

Peter Müller, Don A Berry, Andy P Grieve, Michael Smith, and Michael Krams. 2007. Simulation-based sequential Bayesian design. Journal of statistical planning and inference 137, 10 (2007), 3140--3150.

[34]

Susan A Murphy, Mark J van der Laan, James M Robins, and Conduct Problems Prevention Research Group. 2001. Marginal mean models for dynamic regimes. J. Amer. Statist. Assoc. 96, 456 (2001), 1410--1423.

[35]

Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

[36]

Thomas S Richardson, Yu Liu, James McQueen, and Doug Hains. 2022. A Bayesian Model for Online Activity Sample Sizes. In International Conference on Artificial Intelligence and Statistics. PMLR, 1775--1785.

[37]

Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. 2017. Deep reinforcement learning framework for autonomous driving. Electronic Imaging 2017, 19 (2017), 70--76.

[38]

Felix D Schönbrodt, Eric-Jan Wagenmakers, Michael Zehetleitner, and Marco Perugini. 2017. Sequential hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychological methods 22, 2 (2017), 322.

[39]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International conference on machine learning. PMLR, 1889--1897.

[40]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[41]

Wanggang Shen and Xun Huan. 2021. Bayesian sequential optimal experimental design for nonlinear models using policy gradient reinforcement learning. arXiv preprint arXiv:2110.15335 (2021).

[42]

Shagun Sodhani, Amy Zhang, and Joelle Pineau. 2021. Multi-Task Reinforcement Learning with Context-based Representations. arXiv preprint arXiv:2102.06177 (2021).

[43]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An intro- duction. MIT press.

[44]

Mauricio Tec, Yunshan Duan, and Peter Müller. 2022. A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning. arXiv preprint arXiv:2205.04023 (2022).

[45]

Raphael Vallat. 2018. Pingouin: statistics in Python. J. Open Source Softw. 3, 31 (2018), 1026.

[46]

Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 30.

[47]

Nikos Vlassis, Mohammad Ghavamzadeh, Shie Mannor, and Pascal Poupart. 2012. Bayesian reinforcement learning. Reinforcement learning (2012), 359--386.

[48]

Abraham Wald and Jacob Wolfowitz. 1948. Optimum character of the sequential probability ratio test. The Annals of Mathematical Statistics (1948), 326--339.

[49]

Runzhe Wan, Xinyu Zhang, and Rui Song. 2021. Multi-objective model-based reinforcement learning for infectious disease control. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1634--1644.

Digital Library

[50]

Bernard L Welch. 1947. The generalization of ?STUDENT'S'problem when several different population varlances are involved. Biometrika 34, 1--2 (1947), 28--35

Index Terms

Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring

Recommendations

Solving Complex Sequential Decision-Making Problems by Deep Reinforcement Learning with Heuristic Rules
Computational Science – ICCS 2023
Abstract
Deep reinforcement learning (RL) has demonstrated great capabilities in dealing with sequential decision-making problems, but its performance is often bounded by suboptimal solutions in many complex applications. This paper proposes the use of ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia

Reinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 2023

5996 pages

ISBN:9798400701030

DOI:10.1145/3580305

General Chairs:
Ambuj Singh
UC Santa Barbara, USA
,
Yizhou Sun
UC Los Angeles, USA
,
Program Chairs:
Leman Akoglu
Carnegie Mellon University, USA
,
Dimitrios Gunopulos
University of Athens, Greece
,
Xifeng Yan
UC Santa Barbara, USA
,
Ravi Kumar
Google, USA
,
Fatma Ozcan
Google, USA
,
Jieping Ye
Alibaba DAMO Academy

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '23

Sponsor:

KDD '23: The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 6 - 10, 2023

CA, Long Beach, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
504
Total Downloads

Downloads (Last 12 months)293
Downloads (Last 6 weeks)32

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten