[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3604915.3608808acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article
Open access

Correcting for Interference in Experiments: A Case Study at Douyin

Published: 14 September 2023 Publication History

Abstract

Interference is a ubiquitous problem in experiments conducted on two-sided content marketplaces, such as Douyin (China’s analog of TikTok). In many cases, creators are the natural unit of experimentation, but creators interfere with each other through competition for viewers’ limited time and attention. “Naive” estimators currently used in practice simply ignore the interference, but in doing so incur bias on the order of the treatment effect. We formalize the problem of inference in such experiments as one of policy evaluation. Off-policy estimators, while unbiased, are impractically high variance. We introduce a novel Monte-Carlo estimator, based on “Differences-in-Qs” (DQ) techniques, which achieves bias that is second-order in the treatment effect, while remaining sample-efficient to estimate. On the theoretical side, our contribution is to develop a generalized theory of Taylor expansions for policy evaluation, which extends DQ theory to all major MDP formulations. On the practical side, we implement our estimator on Douyin’s experimentation platform, and in the process develop DQ into a truly “plug-and-play” estimator for interference in real-world settings: one which provides robust, low-bias, low-variance treatment effect estimates; admits computationally cheap, asymptotically exact uncertainty quantification; and reduces MSE by 99% compared to the best existing alternatives in our applications.

References

[1]
Patrick Bajari, Brian Burdick, Guido W Imbens, Lorenzo Masoero, James McQueen, Thomas Richardson, and Ido M Rosen. 2021. Multiple Randomization Designs. arXiv preprint arXiv:2112.13495 (Dec. 2021). arxiv:2112.13495 [cs, econ, math, stat]
[2]
Thomas Blake and Dominic Coey. 2014. Why marketplace experimentation is harder than it seems: The role of test-control interference. In Proc. of the fifteenth ACM conf. on Economics and computation(EC ’14). Association for Computing Machinery, New York, NY, USA, 567–582. https://doi.org/10.1145/2600057.2602837
[3]
Iavor Bojinov and Neil Shephard. 2019. Time series experiments and causal estimands: exact randomization tests and trading. J. Amer. Statist. Assoc. 114, 528 (2019), 1665–1682.
[4]
Iavor Bojinov, David Simchi-Levi, and Jinglong Zhao. 2022. Design and analysis of switchback experiments. Management Science (2022).
[5]
Nicholas Chamandy. 2016. Experimentation in a Ridesharing Marketplace. https://eng.lyft.com/experimentation-in-a-ridesharing-marketplace-b39db027a66e.
[6]
Angus Deaton and Nancy Cartwright. 2018. Understanding and misunderstanding randomized controlled trials. Social science & medicine 210 (2018), 2–21.
[7]
Vivek Farias, Andrew Li, Tianyi Peng, and Andrew Zheng. 2022. Markovian interference in experiments. Advances in Neural Information Processing Systems 35 (2022), 535–549.
[8]
Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline a/b testing for recommender systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. 198–206.
[9]
Peter W Glynn, Ramesh Johari, and Mohammad Rasouli. 2020. Adaptive experimental design with temporal interference: A maximum likelihood approach. Advances in Neural Information Processing Systems 33 (2020), 15054–15064.
[10]
Henning Hohnhold, Deirdre O’Brien, and Diane Tang. 2015. Focus on the long-term: It’s better for users and business. (2015).
[11]
David Holtz and Sinan Aral. 2020. Limiting Bias from Test-Control Interference in Online Marketplace Experiments. SSRN Scholarly Paper 3583596. Social Science Research Network, Rochester, NY. https://doi.org/10.2139/ssrn.3583596
[12]
David Michael Holtz. 2018. Limiting bias from test-control interference in online marketplace experiments. Ph. D. Dissertation. Massachusetts Institute of Technology.
[13]
Nan Jiang and Lihong Li. 2016. Doubly robust off-policy value evaluation for reinforcement learning. In Intl. Conf. on Machine Learning. PMLR, 652–661.
[14]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at a/b tests: Why it matters, and what to do about it. In Proc. of the 23rd ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. 1517–1525.
[15]
Ramesh Johari, Hannah Li, Inessa Liskovich, and Gabriel Y Weintraub. 2022. Experimental design in two-sided platforms: An analysis of bias. Management Science (2022).
[16]
Nathan Kallus and Masatoshi Uehara. 2020. Double Reinforcement Learning for Efficient Off-Policy Evaluation in Markov Decision Processes.J. Mach. Learn. Res. 21, 167 (2020), 1–63.
[17]
Nathan Kallus and Masatoshi Uehara. 2022. Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning. Operations Research (2022).
[18]
Rochelle King, Elizabeth F Churchill, and Caitlin Tan. 2017. Designing with data: Improving the user experience with A/B testing. " O’Reilly Media, Inc.".
[19]
John Kirn. 2022. Challenges in Experimentation. https://eng.lyft.com/challenges-in-experimentation-be9ab98a7ef4.
[20]
Ron Kohavi, Diane Tang, and Ya Xu. 2020. Trustworthy online controlled experiments: A practical guide to a/b testing. Cambridge University Press.
[21]
Hannah Li, Geng Zhao, Ramesh Johari, and Gabriel Y Weintraub. 2022. Interference, bias, and variance in two-sided marketplace experimentation: Guidance for platforms. In Proceedings of the ACM Web Conference 2022(WWW ’22). Association for Computing Machinery, New York, NY, USA, 182–192. https://doi.org/10.1145/3485447.3512063
[22]
Qiang Liu, Lihong Li, Ziyang Tang, and Dengyong Zhou. 2018. Breaking the curse of horizon: Infinite-horizon off-policy estimation. Adv. in Neural Information Processing Systems 31 (2018).
[23]
David Lucking-Reiley. 1999. Using field experiments to test equivalence between auction formats: Magic on the Internet. American Economic Review 89, 5 (Dec. 1999), 1063–1080. https://doi.org/10.1257/aer.89.5.1063
[24]
Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, and Vahab Mirrokni. 2019. Variance reduction in bipartite experiments through correlation clustering. Adv. in Neural Information Processing Systems 32 (2019).
[25]
Doina Precup, Richard S Sutton, and Sanjoy Dasgupta. 2001. Off-policy temporal-difference learning with function approximation. In ICML. 417–424.
[26]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In Intl. conf. on machine learning. PMLR, 1889–1897.
[27]
Dan Siroker and Pete Koomen. 2015. A/B testing: The most powerful way to turn clicks into customers. John Wiley & Sons.
[28]
Harald O Stolberg, Geoffrey Norman, and Isabelle Trop. 2004. Randomized controlled trials. AJR Am J Roentgenol 183, 6 (2004), 1539–44.
[29]
Claudio J Struchiner, Mary Elizabeth Halloran, James M Robins, and Andrew Spielman. 1990. The behaviour of common measures of association used to assess a vaccination programme under complex disease transmission patterns—a computer simulation study of malaria vaccines. Intl. journal of epidemiology 19, 1 (1990), 187–196.
[30]
Richard S Sutton, Csaba Szepesvári, and Hamid Reza Maei. 2008. A convergent O (n) algorithm for off-policy temporal-difference learning with linear function approximation. Adv. in neural information processing systems 21, 21 (2008), 1609–1616.
[31]
M Talbot, AD Milner, MAE Nutkins, and JR Law. 1995. Effect of interference between plots on yield performance in crop variety trials. The J. of Agricultural Science 124, 3 (1995), 335–342.
[32]
Eric J Tchetgen Tchetgen and Tyler J VanderWeele. 2012. On causal inference in the presence of interference. Statistical methods in medical research 21, 1 (2012), 55–75.
[33]
Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In Intl. Conf. on Machine Learning. PMLR, 2139–2148.
[34]
Philip Thomas, Georgios Theocharous, and Mohammad Ghavamzadeh. 2015. High-confidence off-policy evaluation. In Proc. of the AAAI Conf. on Artificial Intelligence, Vol. 29.
[35]
Vivek Trehan. 2019. Marketplace Experimentation. https://www.youtube.com/watch?v=IR000RqN7pw. Accessed: 2023-03-24.
[36]
Jon Vaver and Jim Koehler. 2011. Measuring ad effectiveness using geo experiments. (2011).
[37]
Dylan Walker and Lev Muchnik. 2014. Design of randomized experiments in networks. Proc. of the IEEE 102, 12 (2014), 1940–1951.
[38]
Corwin M Zigler and Georgia Papadogeorgou. 2021. Bipartite causal inference with interference. Statistical science: a review journal of the Institute of Mathematical Statistics 36, 1 (2021), 109.

Cited By

View all
  • (2024)Experimental Design through an Optimization LensSSRN Electronic Journal10.2139/ssrn.4780792Online publication date: 2024
  • (2024)Ranking the causal impact of recommendations under collider bias in k-spots recommender systemsACM Transactions on Recommender Systems10.1145/36431392:2(1-29)Online publication date: 14-May-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems
September 2023
1406 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 September 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. A/B testing
  2. Experimentation
  3. Interference
  4. Off-policy Evaluation
  5. Reinforcement Learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RecSys '23: Seventeenth ACM Conference on Recommender Systems
September 18 - 22, 2023
Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,058
  • Downloads (Last 6 weeks)141
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Experimental Design through an Optimization LensSSRN Electronic Journal10.2139/ssrn.4780792Online publication date: 2024
  • (2024)Ranking the causal impact of recommendations under collider bias in k-spots recommender systemsACM Transactions on Recommender Systems10.1145/36431392:2(1-29)Online publication date: 14-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media