[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3589335.3648310acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

User Response Modeling in Reinforcement Learning for Ads Allocation

Published: 13 May 2024 Publication History

Abstract

User response modeling can enhance the learning of user representations and further improve the reinforcement learning (RL) recommender agent. However, as users' behaviors are influenced by their long-term preferences and short-term stochastic factors (e.g., weather, mood, or fashion trends), it remains challenging for previous works focusing on recurrent neural network-based user response modeling. Meanwhile, due to the dynamic interests of users, it is often unrealistic to assume the dynamics of users are stationary. Drawing inspiration from opponent modeling, we propose a novel network structure, Deep User Q-Network (DUQN), incorporating a user response probabilistic model into the Q-learning ads allocation strategy to capture the effect of the non-stationary user policy on Q-values. Moreover, we utilize the Recurrent State-Space Model (RSSM) to develop the user response model, which includes deterministic and stochastic components, enabling us to fully consider user long-term preferences and short-term stochastic factors. In particular, we design a RetNet version of RSSM (R-RSSM) to support parallel computation. The R-RSSM model can be further used for multi-step predictions to enable bootstrapping over multiple steps simultaneously. Finally, we conduct extensive experiments on a large-scale offline dataset from the Meituan food delivery platform and a public benchmark. Experimental results show that our method yields superior performance to state-of-the-art (SOTA) baselines. Moreover, our model demonstrates a significant improvement in the online A/B test and has been fully deployed on the industrial Meituan platform, serving more than 500 million customers.

Supplemental Material

MP4 File
Presentation video
MP4 File
Supplemental video

References

[1]
Yash Chandak, Georgios Theocharous, James Kostas, Scott Jordan, and Philip Thomas. 2019. Learning action representations for reinforcement learning. In International conference on machine learning. PMLR, 941--950.
[2]
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. 456--464.
[3]
Minmin Chen, Bo Chang, Can Xu, and Ed H Chi. 2021. User response models to improve a reinforce recommender system. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining. 121--129.
[4]
Minmin Chen, Can Xu, Vince Gatto, Devanshu Jain, Aviral Kumar, and Ed Chi. 2022. Off-Policy Actor-critic for Recommender Systems. In Proceedings of the 16th ACM Conference on Recommender Systems. 338--349.
[5]
Yuhui Chen, Haoran Li, and Dongbin Zhao. 2023. Boosting continuous control with consistency policy. arXiv preprint arXiv:2310.06343 (2023).
[6]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[7]
Xing Fang, Qichao Zhang, Yinfeng Gao, and Dongbin Zhao. 2022. Offline Reinforcement Learning for Autonomous Driving with Real World Driving Data. In 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). 3417--3422. https://doi.org/10.1109/ITSC55140.2022.9922100
[8]
Anindya Ghose and Sha Yang. 2009. An empirical analysis of search engine advertising: Sponsored search in electronic markets. Management science, Vol. 55, 10 (2009), 1605--1622.
[9]
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. 2019. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603 (2019).
[10]
Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, and Jimmy Ba. 2020. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193 (2020).
[11]
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. 2023. Mastering Diverse Domains through World Models. arXiv preprint arXiv:2301.04104 (2023).
[12]
He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daumé III. 2016. Opponent modeling in deep reinforcement learning. In International conference on machine learning. PMLR, 1804--1813.
[13]
J Fernando Hernandez-Garcia and Richard S Sutton. 2019. Understanding multi-step deep reinforcement learning: A systematic study of the DQN target. arXiv preprint arXiv:1901.07510 (2019).
[14]
Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/360
[15]
Wang-Cheng Kang and Julian McAuley. 2018. Self-attentive sequential recommendation. In 2018 IEEE international conference on data mining (ICDM). IEEE, 197--206.
[16]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[17]
Iordanis Koutsopoulos. 2016. Optimal advertisement allocation in online social media feeds. In Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-scale mObile computing and online Social neTworking. 43--48.
[18]
Xiang Li, Chao Wang, Bin Tong, Jiwei Tan, Xiaoyi Zeng, and Tao Zhuang. 2020a. Deep Time-Aware Item Evolution Network for Click-Through Rate Prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. https://doi.org/10.1145/3340531.3411952
[19]
Xiang Li, Chao Wang, Bin Tong, Jiwei Tan, Xiaoyi Zeng, and Tao Zhuang. 2020b. Deep time-aware item evolution network for click-through rate prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 785--794.
[20]
Guogang Liao, Xiaowen Shi, Ze Wang, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, and Dong Wang. 2022a. Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2292--2296.
[21]
Guogang Liao, Ze Wang, Xiaoxu Wu, Xiaowen Shi, Chuheng Zhang, Yongkang Wang, Xingxing Wang, and Dong Wang. 2022b. Cross dqn: Cross deep q network for ads allocation in feed. In Proceedings of the ACM Web Conference 2022. 401--409.
[22]
Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).
[23]
Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Peng Jiang, Kun Gai, Xiangyu Zhao, and Yongfeng Zhang. 2023. Exploration and Regularization of the Latent Action Space in Recommendation. In Proceedings of the ACM Web Conference 2023. 833--844.
[24]
Lingheng Meng, Rob Gorbet, and Dana Kuli?. 2021. The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning. In 2020 25th International Conference on Pattern Recognition (ICPR). 347--353. https://doi.org/10.1109/ICPR48806.2021.9413027
[25]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 1928--1937.
[26]
Wentao Ouyang, Xiuwu Zhang, Lei Zhao, Jinmei Luo, Yu Zhang, Heng Zou, Zhaojie Liu, and Yanlong Du. 2020. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction.
[27]
Amit Roy, Riasat Abdullah, Fahim Ahmed, Shahriar Mashfi, Sazid Hayat Khan, and Dewan Ziaul Karim. 2023. RetNet: Retinal Disease Detection using Convolutional Neural Network. In 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE). IEEE, 1--6.
[28]
Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. 2023. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621 (2023).
[29]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[30]
Junjie Wang, Qichao Zhang, and Dongbin Zhao. 2022b. Dynamic-Horizon Model-Based Value Estimation With Latent Imagination. IEEE Transactions on Neural Networks and Learning Systems (2022), 1--14. https://doi.org/10.1109/TNNLS.2022.3215788
[31]
Kai Wang, Zhene Zou, Yue Shang, Qilin Deng, Minghao Zhao, Yile Liang, Runze Wu, Jianrong Tao, Xudong Shen, Tangjie Lyu, et al. [n.,d.]. RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System.
[32]
Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Weinan Zhang, Jun Wang, Xiaotian Hao, Yixi Wang, Han Li, et al. 2019. Learning adaptive display exposure for real-time advertising. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2595--2603.
[33]
Ze Wang, Guogang Liao, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Yongkang Wang, Xingxing Wang, and Dong Wang. 2022a. Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 3555--3564.
[34]
Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, Vol. 8 (1992), 279--292.
[35]
Ruobing Xie, Shaoliang Zhang, Rui Wang, Feng Xia, and Leyu Lin. 2021. Hierarchical reinforcement learning for integrated recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 4521--4528.
[36]
Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, and Bo An. 2023. ResAct: Reinforcing long-term engagement in sequential recommendation with residual actor. In International Conference on Learning Representations.
[37]
Jinyun Yan, Zhiyuan Xu, Birjodh Tiwana, and Shaunak Chatterjee. 2020. Ads allocation in feed via constrained optimization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3386--3394.
[38]
Mengchen Zhao, Zhao Li, Bo An, Haifeng Lu, Yifan Yang, and Chen Chu. 2018a. Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty. In IJCAI. 3940--3946.
[39]
Xiangyu Zhao, Changsheng Gu, Haoshenglun Zhang, Xiwang Yang, Xiaobing Liu, Jiliang Tang, and Hui Liu. 2021. Dear: Deep reinforcement learning for online advertising impression in recommender systems. In Proceedings of the AAAI conference on artificial intelligence, Vol. 35. 750--758.
[40]
Xiangyu Zhao, Liang Zhang, Zhuoye Ding, Long Xia, Jiliang Tang, and Dawei Yin. 2018b. Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3219819.3219886
[41]
Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2019. Deep Reinforcement Learning for List-wise Recommendations. In 1st Workshop on Deep Reinforcement Learning for Knowledge Discovery (DRL4KDD 2019).
[42]
Xiangyu Zhao, Xudong Zheng, Xiwang Yang, Xiaobing Liu, and Jiliang Tang. 2020. Jointly Learning to Recommend and Advertise. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3394486.3403384
[43]
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference. 167--176.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '24: Companion Proceedings of the ACM Web Conference 2024
May 2024
1928 pages
ISBN:9798400701726
DOI:10.1145/3589335
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Check for updates

Author Tags

  1. ads allocation
  2. reinforcement learning
  3. user response modeling

Qualifiers

  • Research-article

Funding Sources

Conference

WWW '24
Sponsor:
WWW '24: The ACM Web Conference 2024
May 13 - 17, 2024
Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 494
    Total Downloads
  • Downloads (Last 12 months)494
  • Downloads (Last 6 weeks)55
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media