[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3357384.3357806acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Learning Adaptive Display Exposure for Real-Time Advertising

Published: 03 November 2019 Publication History

Abstract

In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: request-level constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism.

References

[1]
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv: 1705.10528(2017).
[2]
Shipra Agrawal, Nikhil R Devanur, and Lihong Li. 2016. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Proceedings of COLT. 4--18.
[3]
Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.
[4]
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Open AI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. In Proceedings of NIPS. 5048--5058.
[5]
Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The Option-Critic Archi-tecture. In Proceedings of AAAI. 1726--1734.
[6]
Ashwinkumar Badanidiyuru, John Langford, and Aleksandrs Slivkins. 2014. Re-sourceful contextual bandits. In Proceedings of COLT. 1109--1134.
[7]
Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of WSDM. ACM, 661--670.
[8]
Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, and Yiwei Zhang. 2018. Reinforcement Mechanism Design for e-commerce. In Proceedings of WWW. International World Wide Web Conferences Steering Committee, 1339--1348.
[9]
Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of SIGKDD. ACM, 1187--1196.
[10]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.
[11]
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, et al.2018. Deepq-learning from demonstrations. In Proceedings of AAAI.
[12]
Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforce-ment Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. In Proceedings of SIGKDD.
[13]
Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. 2016. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv: 1611.05397(2016).
[14]
Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising. In Proceedings of CIKM. ACM, 2193--2201.
[15]
Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Proceedings of NIPS. 3675--3683.
[16]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.
[17]
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. JMLR 17, 1 (2016), 1334--1373.
[18]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).
[19]
Aranyak Mehta et al.2013. Online matching and ad allocation. Foundations and Trends in Theoretical Computer Science 8, 4 (2013), 265--368.
[20]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[21]
LA Prashanth and Mohammad Ghavamzadeh. 2016. Variance-constrained actor-critic algorithms for discounted and average reward MDPs. Machine Learning 105, 3 (2016), 367--417.
[22]
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel.2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438(2015).
[23]
Richard S Sutton, Andrew G Barto, et al.1998. Reinforcement learning: An intro-duction. MIT press.
[24]
Liang Tang, Romer Rosales, Ajit Singh, and Deepak Agarwal. 2013. Automatic adformat selection via contextual bandits. In Proceedings of CIKM. ACM, 1587--1594.
[25]
Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward Constrained Policy Optimization. arXiv preprint arXiv: 1805.11074(2018).
[26]
Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, and Kun Gai. 2018. Budget Constrained Bidding by Model-free Reinforcement Learningin Display Advertising. In Proceedings of CIKM. ACM, 1443--1451.
[27]
Huasen Wu, R Srikant, Xin Liu, and Chong Jiang. 2015. Algorithms with loga-rithmic or sublinear regret for constrained contextual bandits. In Proceedings of NIPS. 433--441.
[28]
Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding fordis play advertising. In Proceedings of SIGKDD. ACM, 1077--1086.
[29]
Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, and Xiaofei He. 2018. Deep Re-inforcement Learning for Sponsored Search Real-time Bidding. Proceedings of SIGKDD(2018).

Cited By

View all
  • (2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
  • (2024)Macro Graph Neural Networks for Online Billion-Scale Recommender SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645517(3598-3608)Online publication date: 13-May-2024
  • (2023)LOVF: Layered Organic View Fusion for Click-through Rate Prediction in Online AdvertisingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592014(2139-2143)Online publication date: 19-Jul-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptive ads exposure
  2. constrained two-level reinforcement learning
  3. deep reinforcement learning
  4. learning to advertise
  5. real-time advertising

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
  • (2024)Macro Graph Neural Networks for Online Billion-Scale Recommender SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645517(3598-3608)Online publication date: 13-May-2024
  • (2023)LOVF: Layered Organic View Fusion for Click-through Rate Prediction in Online AdvertisingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592014(2139-2143)Online publication date: 19-Jul-2023
  • (2023)Boosting Advertising Space: Designing Ad Auctions for Augment AdvertisingProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570381(1066-1074)Online publication date: 27-Feb-2023
  • (2023)Optimally integrating ad auction into e-commerce platformsTheoretical Computer Science10.1016/j.tcs.2023.114141976(114141)Online publication date: Oct-2023
  • (2022)Hierarchically Constrained Adaptive Ad Exposure in FeedsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557103(3003-3012)Online publication date: 17-Oct-2022
  • (2022)Cross DQN: Cross Deep Q Network for Ads Allocation in FeedProceedings of the ACM Web Conference 202210.1145/3485447.3512109(401-409)Online publication date: 25-Apr-2022
  • (2022)A Self-Play and Sentiment-Emphasized Comment Integration Framework Based on Deep Q-Learning in a Crowdsourcing ScenarioIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299327234:3(1021-1037)Online publication date: 1-Mar-2022
  • (2021)The Effect of News Article Quality on Ad ConsumptionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482201(3107-3111)Online publication date: 26-Oct-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media