[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments

Published: 01 July 2017 Publication History

Abstract

Firms using online advertising regularly run experiments with multiple versions of their ads since they are uncertain about which ones are most effective. During a campaign, firms try to adapt to intermediate results of their tests, optimizing what they earn while learning about their ads. Yet how should they decide what percentage of impressions to allocate to each ad? This paper answers that question, resolving the well-known "learn-and-earn" trade-off using multi-armed bandit MAB methods. The online advertiser's MAB problem, however, contains particular challenges, such as a hierarchical structure ads within a website, attributes of actions creative elements of an ad, and batched decisions millions of impressions at a time, that are not fully accommodated by existing MAB methods. Our approach captures how the impact of observable ad attributes on ad effectiveness differs by website in unobserved ways, and our policy generates allocations of impressions that can be used in practice. We implemented this policy in a live field experiment delivering over 750 million ad impressions in an online display campaign with a large retail bank. Over the course of two months, our policy achieved an 8% improvement in the customer acquisition rate, relative to a control policy, without any additional costs to the bank. Beyond the actual experiment, we performed counterfactual simulations to evaluate a range of alternative model specifications and allocation rules in MAB policies. Finally, we show that customer acquisition would decrease by about 10% if the firm were to optimize click-through rates instead of conversion directly, a finding that has implications for understanding the marketing funnel.
Data is available at <ext-link ext-link-type="uri" href="https://doi.org/10.1287/mksc.2016.1023">https://doi.org/10.1287/mksc.2016.1023</ext-link>.

References

[1]
Agarwal D, Chen B-C, Elango P (2008) Explore/exploit schemes for web content optimization. Yahoo Research paper series.
[2]
Agrawal R (1995) Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4):1054-1078.
[3]
Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. J. Machine Learn. Res. Workshop Conf. Proc., Vol. 23, 39.1-39.26.
[4]
Anderson E, Simester D (2011) A step-by-step guide to smart business experiments. Harvard Bus. Rev. 89(3):98-105.
[5]
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397-422.
[6]
Bates D, Watts DG (1988) Nonlinear Regression Analysis and Its Applications (Wiley, New York).
[7]
Bates D, Maechler M, Bolker B, Walker S (2013) R Package 'lme4'. http://cran.r-project.org/web/packages/lme4/lme4.pdf.
[8]
Berry DA (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43(3):871-897.
[9]
Berry DA (2004) Bayesian statistics and the efficiency and ethics of clinical trials. Statist. Sci. 19(1):175-187.
[10]
Berry DA, Fristedt B (1985) Bandit Problems (Chapman & Hall, London).
[11]
Bertsimas D, Mersereau AJ (2007) Learning approach for interactive marketing. Oper. Res. 55(6):1120-1135.
[12]
Bradt RN, Johnson SM, Karlin S (1956) On sequential designs for maximizing the sum of n observations. Ann. Math. Statist. 27(4): 1060-1074.
[13]
Braun M, Moe WW (2013) Online display advertising: Modeling the effects of multiple creatives and individual impression histories. Marketing Sci. 32(5):753-767.
[14]
Brezzi M, Lai TL (2002) Optimal learning and experimentation in bandit problems. J. Econom. Dynam. Control 27:87-108.
[15]
Chapelle O, Li L (2011) Advances in neural information processing systems. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, Vol. 24, 1-9.
[16]
Chick SE, Frazier P (2012) Sequential sampling with economics of selection procedures. Management Sci. 58(3):550-569.
[17]
Chick SE, Gans N (2009) Economic analysis of simulation selection problems. Management Sci. 55(3):421-437.
[18]
Chick SE, Inoue K (2001) New two-stage and sequential procedures for selecting the best simulated system. Oper. Res. 49(5):732-743.
[19]
Chick SE, Branke J, Schmidt C (2010) Sequential sampling to myopically maximize the expected value of information. INFORMS J. Comput. 22(1):71-80.
[20]
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Conf. Learn. Theory, 355-366.
[21]
Davenport TH (2009) How to design smart business experiments. Harvard Bus. Rev. 87(2):1-9.
[22]
Donahoe J (2011) How eBay developed a culture of experimentation: HBR interview of John Donahoe. Havard Bus. Rev. 89(3):92-97.
[23]
Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty J, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inform. Processing Systems, Vol. 23, 1-9.
[24]
Frazier PI, Powell WB, Dayanik S (2009) The knowledge-gradient policy for correlated normal beliefs. INFORMS J. Comput. 21(4): 599-613.
[25]
Gelman A, Hill J (2007) Data Analysis Using Regression and Multilevel/ Hierarchical Models (Cambridge University Press, New York).
[26]
Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian Data Analysis, 2 ed. (Chapman & Hall, New York).
[27]
Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Royal Statist. Soc., Ser. B 41(2):148-177.
[28]
Gittins JC, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices, 2 ed. (John Wiley and Sons, New York).
[29]
Goldfarb A, Tucker C (2011) Online display advertising: Targeting and obtrusiveness. Marketing Sci. 30(3):389-404.
[30]
Granmo O-C (2010) Solving two-armed Bernoulli bandit problems using a Bayesian learning automaton. Internat. J. Intelligent Comput. Cybernetics 3(2):207-232.
[31]
Hauser JR, Liberali G, Urban GL (2014) Website morphing 2.0: Technical and implementation advances and a field experiment. Management Sci. 60(6):1594-1616.
[32]
Hauser JR, Urban GL, Liberali G, Braun M (2009) Website morphing. Marketing Sci. 28(2):202-223.
[33]
Hoban P, Bucklin R (2015) Effects of Internet display advertising in the purchase funnel: Model-based insights from a randomized field experiment. J. Marketing Res. 52(3):375-393.
[34]
Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learning Theory (Springer-Verlag, Berlin Heidelberg), 199-213.
[35]
Keller G, Oldale A (2003) Branching bandits: A sequential search process with correlated pay-offs. J. Econom. Theory 113(2):302-315.
[36]
Krishnamurthy V, Wahlberg B (2009) Partially observed Markov decision process multiarmed bandits: Structural results. Math. Oper. Res. 34(2):287-302.
[37]
Lai TL (1987) Adaptive treatment allocation and the multiarmed bandit problem. Ann. Statist. 15(3):1091-1114.
[38]
Lambrecht A, Tucker C (2013) When does retargeting work? Information specificity in online advertising. J. Marketing Res. 50(5):561-576.
[39]
Lin S, Zhang J, Hauser J (2015) Learning from experience, simply. Marketing Sci. 34(1):1-19.
[40]
Manchanda P, Dubé J-P, Goh KY, Chintagunta PK (2006) The effect of banner advertising on Internet purchasing. J. Marketing Res. 43(1):98-108.
[41]
May BC, Korda N, Lee A, Leslie DS (2011) Optimistic Bayesian sampling in contextual bandit problems. Technical report, Department of Mathematics, University of Bristol, Bristol, UK.
[42]
Meyer RJ, Shi Y (1995) Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Management Sci. 41(5):817-834.
[43]
Murphy SA (2005) An experimental design for the development of adaptive treatment strategies. Statist. Medicine 24:1455-1481.
[44]
Ortega PA, Braun DA (2010) A minimum relative entropy principle for learning and acting. J. Artificial Intelligence Res. 38:475-511.
[45]
Ortega PA, Braun DA (2014) Generalized Thompson sampling for sequential decision-making and causal inference. Complex Adaptive Systems Modeling 2(2).
[46]
Osband I, Russo D, Van Roy B (2013) (More) efficient reinforcement learning via posterior sampling. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, Vol. 26, 3003-3011.
[47]
Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660-681.
[48]
Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley, Hoboken, NJ).
[49]
Reiley D, Lewis RA, Papadimitriou P, Garcia-Molina H, Krishnamurthy P (2011) Display advertising impact: Search lift and social influence. Proc. 17th ACM SIGKDD Conf. Knowledge Discovery Data Mining (ACM, New York), 1019-1027.
[50]
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527-535.
[51]
Rubin D (1990) Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66(5):688-701.
[52]
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395-411.
[53]
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221-1243.
[54]
Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639-658.
[55]
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).
[56]
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3):285-294.
[57]
Tsitsiklis JN (1986) A lemma on the multi-armed bandit problem. IEEE Trans. Automatic Control 31(6):576-577.
[58]
Urban GL, Liberali G, Bordley R, MacDonald E, Hauser JR (2014) Morphing banner advertising. Marketing Sci. 33(1):27-46.
[59]
Wahrenberger DL, Antle CE, Klimko LA (1977) Bayesian rules for the two-armed bandit problem. Biometrika 64(1):172-174.
[60]
White JM (2012) Bandit Algorithms for Website Optimization (O'Reilly Media, Sebastopol, CA).
[61]
Whittle P (1980) Multi-armed bandits and the Gittins index. J. Royal Statist. Soc., Ser. B 42(2):143-149.

Cited By

View all
  1. Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Marketing Science
      Marketing Science  Volume 36, Issue 4
      July 2017
      173 pages

      Publisher

      INFORMS

      Linthicum, MD, United States

      Publication History

      Published: 01 July 2017
      Accepted: 29 March 2016
      Received: 16 December 2013

      Author Tags

      1. A/B testing
      2. adaptive experiments
      3. earning-and-learning
      4. explore-exploit
      5. field experiments
      6. hierarchical models
      7. machine learning
      8. multi-armed bandit
      9. online advertising
      10. reinforcement learning
      11. sequential decision making

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Delay-Adaptive Learning in Generalized Linear Contextual BanditsMathematics of Operations Research10.1287/moor.2023.135849:1(326-345)Online publication date: 1-Feb-2024
      • (2024)Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual BanditsManagement Science10.1287/mnsc.2023.489570:2(1315-1342)Online publication date: 1-Feb-2024
      • (2024)How Does Competition Affect Exploration vs. Exploitation? A Tale of Two Recommendation AlgorithmsManagement Science10.1287/mnsc.2023.472270:2(1029-1051)Online publication date: 1-Feb-2024
      • (2024)Unlimited TestingMarketing Science10.1287/mksc.2021.012643:2(419-439)Online publication date: 1-Jan-2024
      • (2024)When Variety Seeking Meets UnexpectednessInformation Systems Research10.1287/isre.2021.005335:3(1257-1273)Online publication date: 1-Sep-2024
      • (2024)Non-stationary Bandits with Heavy TailACM SIGMETRICS Performance Evaluation Review10.1145/3695411.369542452:2(33-35)Online publication date: 6-Sep-2024
      • (2024)A Bayesian Multi-Armed Bandit Algorithm for Bid Shading in Online Display AdvertisingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680107(4506-4513)Online publication date: 21-Oct-2024
      • (2024)Two-Stage Dynamic Creative Optimization Under Sparse Ambiguous Samples for e-Commerce AdvertisingSN Computer Science10.1007/s42979-024-03332-z5:8Online publication date: 26-Oct-2024
      • (2023)On private and robust banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667633(34778-34790)Online publication date: 10-Dec-2023
      • (2023)Nash regret guarantees for linear banditsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667568(33288-33318)Online publication date: 10-Dec-2023
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media