More Web Proxy on the site http://driver.im/

research-article

Open access

User Response Modeling in Reinforcement Learning for Ads Allocation

Authors:

Dongbin ZhaoAuthors Info & Claims

WWW '24: Companion Proceedings of the ACM Web Conference 2024

Pages 131 - 140

https://doi.org/10.1145/3589335.3648310

Published: 13 May 2024 Publication History

Abstract

User response modeling can enhance the learning of user representations and further improve the reinforcement learning (RL) recommender agent. However, as users' behaviors are influenced by their long-term preferences and short-term stochastic factors (e.g., weather, mood, or fashion trends), it remains challenging for previous works focusing on recurrent neural network-based user response modeling. Meanwhile, due to the dynamic interests of users, it is often unrealistic to assume the dynamics of users are stationary. Drawing inspiration from opponent modeling, we propose a novel network structure, Deep User Q-Network (DUQN), incorporating a user response probabilistic model into the Q-learning ads allocation strategy to capture the effect of the non-stationary user policy on Q-values. Moreover, we utilize the Recurrent State-Space Model (RSSM) to develop the user response model, which includes deterministic and stochastic components, enabling us to fully consider user long-term preferences and short-term stochastic factors. In particular, we design a RetNet version of RSSM (R-RSSM) to support parallel computation. The R-RSSM model can be further used for multi-step predictions to enable bootstrapping over multiple steps simultaneously. Finally, we conduct extensive experiments on a large-scale offline dataset from the Meituan food delivery platform and a public benchmark. Experimental results show that our method yields superior performance to state-of-the-art (SOTA) baselines. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on the industrial Meituan platform, serving more than 500 million customers.

Supplemental Material

MP4 File

Presentation video

Download
1229.88 MB

MP4 File

Supplemental video

Download
43.79 MB

References

[1]

Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, and Philip Thomas. 2019. Learning action representations for reinforcement learning. In International conference on machine learning. PMLR, 941--950.

[2]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456--464.

Digital Library

[3]

Minmin Chen, Bo Chang, Can Xu, and Ed H Chi. 2021. User response models to improve a reinforce recommender system. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 121--129.

Digital Library

[4]

Minmin Chen, Can Xu, Vince Gatto, Devanshu Jain, Aviral Kumar, and Ed Chi. 2022. Off-Policy Actor-critic for Recommender Systems. In Proceedings of the 16th ACM Conference on Recommender Systems. 338--349.

Digital Library

[5]

Yuhui Chen, Haoran Li, and Dongbin Zhao. 2023. Boosting continuous control with consistency policy. arXiv preprint arXiv:2310.06343 (2023).

[6]

Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).

[7]

Xing Fang, Qichao Zhang, Yinfeng Gao, and Dongbin Zhao. 2022. Offline Reinforcement Learning for Autonomous Driving with Real World Driving Data. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). 3417--3422. https://doi.org/10.1109/ITSC55140.2022.9922100

Digital Library

[8]

Anindya Ghose and Sha Yang. 2009. An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management science, Vol. 55, 10 (2009), 1605--1622.

[9]

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2019. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603 (2019).

[10]

Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. 2020. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193 (2020).

[11]

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. 2023. Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104 (2023).

[12]

He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daumé III. 2016. Opponent modeling in deep reinforcement learning. In International conference on machine learning. PMLR, 1804--1813.

[13]

J Fernando Hernandez-Garcia and Richard S Sutton. 2019. Understanding multi-step deep reinforcement learning: A systematic study of the DQN target. arXiv preprint arXiv:1901.07510 (2019).

[14]

Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/360

[15]

Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197--206.

[16]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[17]

Iordanis Koutsopoulos. 2016. Optimal advertisement allocation in online social media feeds. In Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-scale mObile computing and online Social neTworking. 43--48.

Digital Library

[18]

Xiang Li, Chao Wang, Bin Tong, Jiwei Tan, Xiaoyi Zeng, and Tao Zhuang. 2020a. Deep Time-Aware Item Evolution Network for Click-Through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. https://doi.org/10.1145/3340531.3411952

Digital Library

[19]

Xiang Li, Chao Wang, Bin Tong, Jiwei Tan, Xiaoyi Zeng, and Tao Zhuang. 2020b. Deep time-aware item evolution network for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 785--794.

Digital Library

[20]

Guogang Liao, Xiaowen Shi, Ze Wang, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, and Dong Wang. 2022a. Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2292--2296.

Digital Library

[21]

Guogang Liao, Ze Wang, Xiaoxu Wu, Xiaowen Shi, Chuheng Zhang, Yongkang Wang, Xingxing Wang, and Dong Wang. 2022b. Cross dqn: Cross deep q network for ads allocation in feed. In Proceedings of the ACM Web Conference 2022. 401--409.

Digital Library

[22]

Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).

[23]

Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Peng Jiang, Kun Gai, Xiangyu Zhao, and Yongfeng Zhang. 2023. Exploration and Regularization of the Latent Action Space in Recommendation. In Proceedings of the ACM Web Conference 2023. 833--844.

Digital Library

[24]

Lingheng Meng, Rob Gorbet, and Dana Kuli?. 2021. The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning. In 2020 25th International Conference on Pattern Recognition (ICPR). 347--353. https://doi.org/10.1109/ICPR48806.2021.9413027

[25]

Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.

[26]

Wentao Ouyang, Xiuwu Zhang, Lei Zhao, Jinmei Luo, Yu Zhang, Heng Zou, Zhaojie Liu, and Yanlong Du. 2020. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction.

[27]

Amit Roy, Riasat Abdullah, Fahim Ahmed, Shahriar Mashfi, Sazid Hayat Khan, and Dewan Ziaul Karim. 2023. RetNet: Retinal Disease Detection using Convolutional Neural Network. In 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 1--6.

[28]

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. 2023. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621 (2023).

[29]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[30]

Junjie Wang, Qichao Zhang, and Dongbin Zhao. 2022b. Dynamic-Horizon Model-Based Value Estimation With Latent Imagination. IEEE Transactions on Neural Networks and Learning Systems (2022), 1--14. https://doi.org/10.1109/TNNLS.2022.3215788

[31]

Kai Wang, Zhene Zou, Yue Shang, Qilin Deng, Minghao Zhao, Yile Liang, Runze Wu, Jianrong Tao, Xudong Shen, Tangjie Lyu, et al. [n.,d.]. RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System.

[32]

Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Weinan Zhang, Jun Wang, Xiaotian Hao, Yixi Wang, Han Li, et al. 2019. Learning adaptive display exposure for real-time advertising. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2595--2603.

Digital Library

[33]

Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, and Dong Wang. 2022a. Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3555--3564.

Digital Library

[34]

Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, Vol. 8 (1992), 279--292.

[35]

Ruobing Xie, Shaoliang Zhang, Rui Wang, Feng Xia, and Leyu Lin. 2021. Hierarchical reinforcement learning for integrated recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4521--4528.

[36]

Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, and Bo An. 2023. ResAct: Reinforcing long-term engagement in sequential recommendation with residual actor. In International Conference on Learning Representations.

[37]

Jinyun Yan, Zhiyuan Xu, Birjodh Tiwana, and Shaunak Chatterjee. 2020. Ads allocation in feed via constrained optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3386--3394.

Digital Library

[38]

Mengchen Zhao, Zhao Li, Bo An, Haifeng Lu, Yifan Yang, and Chen Chu. 2018a. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty. In IJCAI. 3940--3946.

[39]

Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiwang Yang, Xiaobing Liu, Jiliang Tang, and Hui Liu. 2021. Dear: Deep reinforcement learning for online advertising impression in recommender systems. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 750--758.

[40]

Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018b. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3219819.3219886

Digital Library

[41]

Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019. Deep Reinforcement Learning for List-wise Recommendations. In 1st Workshop on Deep Reinforcement Learning for Knowledge Discovery (DRL4KDD 2019).

[42]

Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, and Jiliang Tang. 2020. Jointly Learning to Recommend and Advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3394486.3403384

Digital Library

[43]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference. 167--176.

Digital Library

Index Terms

User Response Modeling in Reinforcement Learning for Ads Allocation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. World Wide Web
    1. Online advertising
    2. Web applications
      1. Electronic commerce

Recommendations

Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e.g., e-commerce and news feed sites). To achieve better allocation, the input of recent RL-...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning

<P>A large class of problems of sequential decision making under uncertainty, of which the underlying probability structure is a Markov process, can be modeled as stochastic dynamic programs referred to, in general, as Markov decision problems or MDPs. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Companion Proceedings of the ACM Web Conference 2024

May 2024

1928 pages

ISBN:9798400701726

DOI:10.1145/3589335

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University
,
Roy Ka-Wei Lee
Singapore University of Technology and Design

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Meituan
National Natural Science Foundation of China

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
494
Total Downloads

Downloads (Last 12 months)494
Downloads (Last 6 weeks)55

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents