More Web Proxy on the site http://driver.im/

research-article

Open access

User Response Models to Improve a REINFORCE Recommender System

Authors:

Ed H. ChiAuthors Info & Claims

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Pages 121 - 129

https://doi.org/10.1145/3437963.3441764

Published: 08 March 2021 Publication History

Abstract

Reinforcement Learning (RL) techniques have been sought after as the next-generation tools to further advance the field of recommendation research. Different from classic applications of RL, recommender agents, especially those deployed on commercial recommendation platforms, have to operate in extremely large state and action spaces, serving a dynamic user base in the order of billions, and a long-tail item corpus in the order of millions or billions. The (positive) user feedback available to train such agents is extremely scarce in retrospect. Improving the sample efficiency of RL algorithms is thus of paramount importance when developing RL agents for recommender systems. In this work, we present a general framework to augment the training of model-free RL agents with auxiliary tasks for improved sample efficiency. More specifically, we opt to add additional tasks that predict users' immediate responses (positive or negative) toward recommendations, i.e., user response modeling, to enhance the learning of the state and action representations for the recommender agents. We also introduce a tool based on gradient correlation analysis to guide the model design. We showcase the efficacy of our method in offline experiments, learning and evaluating agent policies over hundreds of millions of user trajectories. We also conduct live experiments on an industrial recommendation platform serving billions of users and tens of millions of items to verify its benefit.

References

[1]

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. 2017. Constrained policy optimization. arXiv preprint arXiv:1705.10528 (2017).

Digital Library

[2]

Rishabh Agarwal, Dale Schuurmans, and Mohammad Norouzi. 2019. Striving for Simplicity in Off-policy Deep Reinforcement Learning. arXiv preprint arXiv:1907.04543 (2019).

[3]

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H Chi. 2019 a. Top-k off-policy correction for a REINFORCE recommender system. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining . 456--464.

Digital Library

[4]

Minmin Chen, Ramki Gummadi, Chris Harris, and Dale Schuurmans. 2019 b. Surrogate Objectives for Batch Policy Optimization in One-step Decision Making. In Advances in Neural Information Processing Systems. 8825--8835.

[5]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[6]

Corinna Cortes, Yishay Mansour, and Mehryar Mohri. 2010. Learning bounds for importance weighting. In Advances in neural information processing systems .

[7]

Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 191--198.

Digital Library

[8]

Yunshu Du, Wojciech M Czarnecki, Siddhant M Jayakumar, Razvan Pascanu, and Balaji Lakshminarayanan. 2018. Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224 (2018).

[9]

Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, and Ben Coppin. 2015. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679 (2015).

[10]

Damien Ernst, Pierre Geurts, and Louis Wehenkel. 2005. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, Vol. 6, Apr (2005).

[11]

Scott Fujimoto, David Meger, and Doina Precup. 2018. Off-policy deep reinforcement learning without exploration. arXiv preprint arXiv:1812.02900 (2018).

[12]

Alexandre Gilotte, Clément Calauzènes, Thomas Nedelec, Alexandre Abraham, and Simon Dollé. 2018. Offline A/B testing for Recommender Systems. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 198--206.

Digital Library

[13]

David Ha and Jürgen Schmidhuber. 2018. World models. arXiv preprint arXiv:1803.10122 (2018).

[14]

Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, and James Davidson. 2019. Learning latent dynamics for planning from pixels. In International Conference on Machine Learning . 2555--2565.

[15]

Irina Higgins, Arka Pal, Andrei A Rusu, Loic Matthey, Christopher P Burgess, Alexander Pritzel, Matthew Botvinick, Charles Blundell, and Alexander Lerchner. 2017. Darla: Improving zero-shot transfer in reinforcement learning. arXiv preprint arXiv:1707.08475 (2017).

[16]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[17]

Yujing Hu, Qing Da, Anxiang Zeng, Yang Yu, and Yinghui Xu. 2018. Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining . 368--377.

Digital Library

[18]

Eugene Ie, Vihan Jain, Jing Wang, Sanmit Narvekar, Ritesh Agarwal, Rui Wu, Heng-Tze Cheng, Tushar Chandra, and Craig Boutilier. 2019. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019).

[19]

Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z Leibo, David Silver, and Koray Kavukcuoglu. 2017. Reinforcement learning with unsupervised auxiliary tasks. In ICLR .

[20]

Sébastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2014. On using very large target vocabulary for neural machine translation. arXiv preprint arXiv:1412.2007 (2014).

[21]

Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et almbox. 2019. Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 (2019).

[22]

Shivaram Kalyanakrishnan and Peter Stone. 2007. Batch reinforcement learning in a complex domain. In Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems. ACM, 94.

Digital Library

[23]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et almbox. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, Vol. 114, 13 (2017).

[24]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization techniques for recommender systems. Computer, Vol. 42, 8 (2009), 30--37.

Digital Library

[25]

Aviral Kumar, Justin Fu, George Tucker, and Sergey Levine. 2019. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. arXiv preprint arXiv:1906.00949 (2019).

[26]

Sascha Lange, Thomas Gabel, and Martin Riedmiller. 2012. Batch reinforcement learning. In Reinforcement learning . Springer, 45--73.

[27]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020).

[28]

Sergey Levine, Peter Pastor, Alex Krizhevsky, Julian Ibarz, and Deirdre Quillen. 2018. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. The International Journal of Robotics Research, Vol. 37, 4--5 (2018), 421--436.

[29]

Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, and Yuzhou Zhang. 2018. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027 (2018).

[30]

Shikun Liu, Andrew Davison, and Edward Johns. 2019. Self-supervised generalisation with meta auxiliary learning. In Advances in Neural Information Processing Systems. 1679--1689.

[31]

Taylor Mordan, Nicolas Thome, Gilles Henaff, and Matthieu Cord. 2018. Revisiting multi-task learning with rock: a deep residual auxiliary block for visual detection. In Advances in Neural Information Processing Systems. 1310--1322.

[32]

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural Networks, Vol. 113 (2019), 54--71.

Digital Library

[33]

Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016).

[34]

Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. 2001. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th international conference on World Wide Web. 285--295.

Digital Library

[35]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning . 1889--1897.

Digital Library

[36]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[37]

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. 2018. Time-contrastive networks: Self-supervised learning from video. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 1134--1141.

Digital Library

[38]

Guy Shani, David Heckerman, and Ronen I Brafman. 2005. An MDP-based recommender system. Journal of Machine Learning Research, Vol. 6, Sep (2005).

[39]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484.

[40]

David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et almbox. 2017. Mastering the game of go without human knowledge. nature, Vol. 550, 7676 (2017), 354--359.

[41]

Aravind Srinivas, Michael Laskin, and Pieter Abbeel. 2020. Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136 (2020).

[42]

Richard S Sutton, Andrew G Barto, et almbox. 1998. Reinforcement learning: An introduction .MIT press.

Digital Library

[43]

Adith Swaminathan and Thorsten Joachims. 2015a. Batch learning from logged bandit feedback through counterfactual risk minimization. Journal of Machine Learning Research, Vol. 16, 1 (2015), 1731--1755.

Digital Library

[44]

Adith Swaminathan and Thorsten Joachims. 2015b. The self-normalized estimator for counterfactual learning. In Advances in Neural Information Processing Systems .

[45]

Philip Thomas and Emma Brunskill. 2016. Data-efficient off-policy policy evaluation for reinforcement learning. In ICML . 2139--2148.

[46]

Shubham Toshniwal, Hao Tang, Liang Lu, and Karen Livescu. 2017. Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition. Proc. Interspeech 2017 (2017), 3532--3536.

[47]

Trieu Trinh, Andrew Dai, Thang Luong, and Quoc Le. 2018. Learning Longer-term Dependencies in RNNs with Auxiliary Losses. In International Conference on Machine Learning. 4965--4974.

[48]

Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning, Vol. 8, 3--4 (1992), 279--292.

[49]

Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, Vol. 8, 3--4 (1992), 229--256.

Digital Library

[50]

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recommender system: A survey and new perspectives. ACM Computing Surveys (CSUR), Vol. 52, 1 (2019), 1--38.

Digital Library

[51]

Yuting Zhang, Kibok Lee, and Honglak Lee. 2016. Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In International conference on machine learning . 612--621.

[52]

Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019 a. " Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM SIGWEB Newsletter Spring (2019), 1--15.

[53]

Xiangyu Zhao, Long Xia, Dawei Yin, and Jiliang Tang. 2019 b. Model-based reinforcement learning for whole-chain recommendations. arXiv preprint arXiv:1902.03987 (2019).

[54]

Xiangyu Zhao, Liang Zhang, Long Xia, Zhuoye Ding, Dawei Yin, and Jiliang Tang. 2017. Deep reinforcement learning for list-wise recommendations. arXiv preprint arXiv:1801.00209 (2017).

[55]

Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, Yang Xiang, Nicholas Jing Yuan, Xing Xie, and Zhenhui Li. 2018. DRN: A deep reinforcement learning framework for news recommendation. (2018), 167--176.

Digital Library

Cited By

Labarca ÁParra DIcarte RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)On the unexpected effectiveness of reinforcement learning for sequential recommendationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693918(45432-45450)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693918
Wu YMacdonald COunis I(2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3651169
Jeunen OUstimenko A(2024)Δ-OPE: Off-Policy Estimation with Pairs of PoliciesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688162(878-883)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688162
Show More Cited By

Index Terms

User Response Models to Improve a REINFORCE Recommender System
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
      2. Reinforcement learning
        Sequential decision making
    2. Learning settings
      1. Batch learning

Recommendations

A Scalable, Accurate Hybrid Recommender System
WKDD '10: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining

Recommender systems apply machine learning techniques for filtering unseen information and can predict whether a user would like a given resource. There are three main types of recommender systems: collaborative filtering, content-based filtering, and ...
Acquiring User Information Needs for Recommender Systems
WI-IAT '13: Proceedings of the 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 03

Most recommender systems attempt to use collaborative filtering, content-based filtering or hybrid approach to recommend items to new users. Collaborative filtering recommends items to new users based on their similar neighbours, and content-based ...
User Personality and User Satisfaction with Recommender Systems

In this study, we show that individual users' preferences for the level of diversity, popularity, and serendipity in recommendation lists cannot be inferred from their ratings alone. We demonstrate that we can extract strong signals about individual ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

March 2021

1192 pages

ISBN:9781450382977

DOI:10.1145/3437963

General Chairs:
Liane Lewin-Eytan
Amazon, Israel
,
David Carmel
Amazon, Israel
,
Elad Yom-Tov
Microsoft, Israel
,
Program Chairs:
Eugene Agichtein
Emory University and Amazon, USA
,
Evgeniy Gabrilovich
Google Health, USA

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 March 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM '21

Sponsor:

WSDM '21: The Fourteenth ACM International Conference on Web Search and Data Mining

March 8 - 12, 2021

Virtual Event, Israel

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
3,041
Total Downloads

Downloads (Last 12 months)386
Downloads (Last 6 weeks)27

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Labarca ÁParra DIcarte RSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)On the unexpected effectiveness of reinforcement learning for sequential recommendationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693918(45432-45450)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693918
Wu YMacdonald COunis I(2024)Personalised Multi-modal Interactive Recommendation with Hierarchical State RepresentationsACM Transactions on Recommender Systems10.1145/36511692:3(1-25)Online publication date: 4-Mar-2024
https://dl.acm.org/doi/10.1145/3651169
Jeunen OUstimenko A(2024)Δ-OPE: Off-Policy Estimation with Pairs of PoliciesProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688162(878-883)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688162
Jeunen OMandav JPotapov IAgarwal NVaid SShi WUstimenko A(2024)Multi-Objective Recommendation via Multivariate Policy LearningProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688132(712-721)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688132
Gupta SJeunen OOosterhuis Hde Rijke M(2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3640457.3688105
Wang XLiu SWang XCai QHu LLi HJiang PGai KXie GBaeza-Yates RBonchi F(2024)Future Impact Decomposition in Request-level RecommendationsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671506(5905-5916)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671506
Qin WCao ZYu WSi ZChen SXu JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Explicitly Integrating Judgment Prediction with Legal Document Retrieval: A Law-Guided Generative ApproachProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657717(2210-2220)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657717
Zhang ZZhang QWu XShi XLiao GWang YWang XZhao DChua TNgo CKumar RLauw HKa-Wei Lee R(2024)User Response Modeling in Reinforcement Learning for Ads AllocationCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3648310(131-140)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3648310
Lin YLiu YLin FZou LWu PZeng WChen HMiao C(2024)A Survey on Reinforcement Learning for Recommender SystemsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.328016135:10(13164-13184)Online publication date: Oct-2024
https://doi.org/10.1109/TNNLS.2023.3280161
Chen WZhang SXie RXia FLin LZhang XWang YZhang Y(2024)CIPPO: Contrastive Imitation Proximal Policy Optimization for Recommendation Based on Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.340264936:11(5753-5767)Online publication date: Nov-2024
https://doi.org/10.1109/TKDE.2024.3402649
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents