More Web Proxy on the site http://driver.im/

research-article

Learning Adaptive Display Exposure for Real-Time Advertising

Authors:

Kun GaiAuthors Info & Claims

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

Pages 2595 - 2603

https://doi.org/10.1145/3357384.3357806

Published: 03 November 2019 Publication History

Abstract

In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: request-level constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism.

References

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv: 1705.10528(2017).

[2]

Shipra Agrawal, Nikhil R Devanur, and Lihong Li. 2016. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In Proceedings of COLT. 4--18.

[3]

Eitan Altman. 1999. Constrained Markov decision processes. Vol. 7. CRC Press.

[4]

Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Open AI Pieter Abbeel, and Wojciech Zaremba. 2017. Hindsight experience replay. In Proceedings of NIPS. 5048--5058.

[5]

Pierre-Luc Bacon, Jean Harb, and Doina Precup. 2017. The Option-Critic Archi-tecture. In Proceedings of AAAI. 1726--1734.

[6]

Ashwinkumar Badanidiyuru, John Langford, and Aleksandrs Slivkins. 2014. Re-sourceful contextual bandits. In Proceedings of COLT. 1109--1134.

[7]

Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of WSDM. ACM, 661--670.

Digital Library

[8]

Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, and Yiwei Zhang. 2018. Reinforcement Mechanism Design for e-commerce. In Proceedings of WWW. International World Wide Web Conferences Steering Committee, 1339--1348.

Digital Library

[9]

Shi-Yong Chen, Yang Yu, Qing Da, Jun Tan, Hai-Kuan Huang, and Hai-Hong Tang. 2018. Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In Proceedings of SIGKDD. ACM, 1187--1196.

Digital Library

[10]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. MIT press Cambridge.

Digital Library

[11]

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Ian Osband, et al.2018. Deepq-learning from demonstrations. In Proceedings of AAAI.

[12]

Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforce-ment Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application. In Proceedings of SIGKDD.

[13]

Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. 2016. Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv: 1611.05397(2016).

[14]

Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising. In Proceedings of CIKM. ACM, 2193--2201.

Digital Library

[15]

Tejas D Kulkarni, Karthik Narasimhan, Ardavan Saeedi, and Josh Tenenbaum. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In Proceedings of NIPS. 3675--3683.

[16]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436.

[17]

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. JMLR 17, 1 (2016), 1334--1373.

Digital Library

[18]

Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,Yuval Tassa, David Silver, and Daan Wierstra. 2015. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971(2015).

[19]

Aranyak Mehta et al.2013. Online matching and ad allocation. Foundations and Trends in Theoretical Computer Science 8, 4 (2013), 265--368.

[20]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.

[21]

LA Prashanth and Mohammad Ghavamzadeh. 2016. Variance-constrained actor-critic algorithms for discounted and average reward MDPs. Machine Learning 105, 3 (2016), 367--417.

Digital Library

[22]

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel.2015. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438(2015).

[23]

Richard S Sutton, Andrew G Barto, et al.1998. Reinforcement learning: An intro-duction. MIT press.

[24]

Liang Tang, Romer Rosales, Ajit Singh, and Deepak Agarwal. 2013. Automatic adformat selection via contextual bandits. In Proceedings of CIKM. ACM, 1587--1594.

[25]

Chen Tessler, Daniel J Mankowitz, and Shie Mannor. 2018. Reward Constrained Policy Optimization. arXiv preprint arXiv: 1805.11074(2018).

[26]

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, and Kun Gai. 2018. Budget Constrained Bidding by Model-free Reinforcement Learningin Display Advertising. In Proceedings of CIKM. ACM, 1443--1451.

[27]

Huasen Wu, R Srikant, Xin Liu, and Chong Jiang. 2015. Algorithms with loga-rithmic or sublinear regret for constrained contextual bandits. In Proceedings of NIPS. 433--441.

[28]

Weinan Zhang, Shuai Yuan, and Jun Wang. 2014. Optimal real-time bidding fordis play advertising. In Proceedings of SIGKDD. ACM, 1077--1086.

[29]

Jun Zhao, Guang Qiu, Ziyu Guan, Wei Zhao, and Xiaofei He. 2018. Deep Re-inforcement Learning for Sponsored Search Real-time Bidding. Proceedings of SIGKDD(2018).

Cited By

Zhang ZZhang QWu XShi XLiao GWang YWang XZhao DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648310
Chen HBei YShen QXu YZhou SHuang WHuang FWang SHuang XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Macro Graph Neural Networks for Online Billion-Scale Recommender SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645517(3598-3608)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645517
Kong LWang LZhao XJin JLin ZHu JShao JChen HDuh WHuang HKato MMothe JPoblete B(2023)LOVF: Layered Organic View Fusion for Click-through Rate Prediction in Online AdvertisingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592014(2139-2143)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592014
Show More Cited By

Index Terms

Learning Adaptive Display Exposure for Real-Time Advertising
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Information systems
  1. Information systems applications
    1. Computational advertising

Recommendations

Real-Time Bidding in Online Display Advertising

Display advertising is a major source of revenue for many online publishers and content providers. Historically, display advertising impressions have been sold through prenegotiated contracts, known as reservation contracts, between publishers and ...
Online Display Advertising: Targeting and Obtrusiveness

We use data from a large-scale field experiment to explore what influences the effectiveness of online advertising. We find that matching an ad to website content and increasing an ad's obtrusiveness independently increase purchase intent. However, in ...
Online Display Advertising: Modeling the Effects of Multiple Creatives and Individual Impression Histories

Online advertising campaigns often consist of multiple ads, each with different creative content. We consider how various creatives in a campaign differentially affect behavior given the targeted individual's ad impression history, as characterized by ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

November 2019

3373 pages

ISBN:9781450369763

DOI:10.1145/3357384

General Chairs:
Wenwu Zhu
Tsinghua University, China
,
Dacheng Tao
University of Massachusetts, USA
,
Xueqi Cheng
Institute of Computing Technology, CAS, China
,
Program Chairs:
Peng Cui
Tsinghua University, China
,
Elke Rundensteiner
Worcester Polytechnic Institute, USA
,
David Carmel
Amazon Research, USA
,
Qi He
LinkedIn, USA
,
Jeffrey Xu Yu
Chinese University of Hong Kong, China

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

CIKM '19

Sponsor:

CIKM '19: The 28th ACM International Conference on Information and Knowledge Management

November 3 - 7, 2019

Beijing, China

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZZhang QWu XShi XLiao GWang YWang XZhao DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648310
Chen HBei YShen QXu YZhou SHuang WHuang FWang SHuang XChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Macro Graph Neural Networks for Online Billion-Scale Recommender SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645517(3598-3608)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645517
Kong LWang LZhao XJin JLin ZHu JShao JChen HDuh WHuang HKato MMothe JPoblete B(2023)LOVF: Layered Organic View Fusion for Click-through Rate Prediction in Online AdvertisingProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592014(2139-2143)Online publication date: 19-Jul-2023
https://dl.acm.org/doi/10.1145/3539618.3592014
Liu YChen DZheng ZZhang ZYu CWu FChen GChua TLauw HSi LTerzi ETsaparas P(2023)Boosting Advertising Space: Designing Ad Auctions for Augment AdvertisingProceedings of the Sixteenth ACM International Conference on Web Search and Data Mining10.1145/3539597.3570381(1066-1074)Online publication date: 27-Feb-2023
https://dl.acm.org/doi/10.1145/3539597.3570381
Li WQi QWang CYu C(2023)Optimally integrating ad auction into e-commerce platformsTheoretical Computer Science10.1016/j.tcs.2023.114141976(114141)Online publication date: Oct-2023
https://doi.org/10.1016/j.tcs.2023.114141
Chen DYan QChen CZheng ZLiu YMa ZYu CXu JZheng BAl Hasan MXiong L(2022)Hierarchically Constrained Adaptive Ad Exposure in FeedsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557103(3003-3012)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557103
Liao GWang ZWu XShi XZhang CWang YWang XWang D(2022)Cross DQN: Cross Deep Q Network for Ads Allocation in FeedProceedings of the ACM Web Conference 202210.1145/3485447.3512109(401-409)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512109
Rong HSheng VMa TZhou YAl-Rodhaan M(2022)A Self-Play and Sentiment-Emphasized Comment Integration Framework Based on Deep Q-Learning in a Crowdsourcing ScenarioIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299327234:3(1021-1037)Online publication date: 1-Mar-2022
https://doi.org/10.1109/TKDE.2020.2993272
Iizuka KSeki YKato MDemartini GZuccon GCulpepper JHuang ZTong H(2021)The Effect of News Article Quality on Ad ConsumptionProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482201(3107-3111)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482201

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten