[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3523227.3547370acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
tutorial

Hands-on Reinforcement Learning for Recommender Systems - From Bandits to SlateQ to Offline RL with Ray RLlib

Published: 13 September 2022 Publication History

Abstract

Reinforcement learning (RL) is gaining traction as a complementary approach to supervised learning for RecSys due to its ability to solve sequential decision-making processes for delayed rewards. Recent advances in offline reinforcement learning, off-policy evaluation, and more scalable, performant system design with the ability to run code in parallel, have made RL more tractable for the RecSys real time use cases. This tutorial introduces RLlib [9], a comprehensive open-source Python RL framework built for production workloads. RLlib is built on top of open-source Ray [8], an easy-to-use, distributed computing framework for Python that can handle complex, heterogeneous applications. Ray and RLlib run on compute clusters on any cloud without vendor lock. Using Colab notebooks, you will leave this tutorial with a complete, working example of parallelized Python RL code using RLlib for RecSys on a github repo.

References

[1]
Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed H. Chi. 2019. Top-K Off-Policy Correction for a REINFORCE Recommender System. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (Melbourne VIC, Australia) (WSDM ’19). Association for Computing Machinery, New York, NY, USA, 456–464. https://doi.org/10.1145/3289600.3290999
[2]
Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, and Michael Laskin. 2021. Hierarchical few-shot imitation with skill transition models. arXiv preprint arXiv:2107.08981 abs/2107.08981, 1 (2021), 1–19.
[3]
Xu He, Bo An, Yanghua Li, Haikai Chen, Rundong Wang, Xinrun Wang, Runsheng Yu, Xin Li, and Zhirong Wang. 2020. Learning to collaborate in multi-module recommendation via multi-agent reinforcement learning without communication. In Fourteenth ACM Conference on Recommender Systems. Fourteenth ACM Conference on Recommender Systems, New York, NY, USA, 210–219.
[4]
Eugene Ie, Chih-wei Hsu, Martin Mladenov, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu, and Craig Boutilier. 2019. Recsim: A configurable simulation platform for recommender systems. arXiv preprint arXiv:1909.04847 abs/1909.04847 (2019), 1–23.
[5]
Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 abs/2005.01643 (2020), 1–43.
[6]
Eric Liang, Zhanghao Wu, Michael Luo, Sven Mika, and Ion Stoica. 2020. Distributed Reinforcement Learning is a Dataflow Problem. arXiv preprint arXiv:2011.12719 34 (2020), 5506–5517.
[7]
Francois Mairesse, Zhonghao Luo, and Tao Ye. 2021. Learning a Voice-based Conversational Recommender using Offline Policy Optimization. In Fifteenth ACM Conference on Recommender Systems. Association for Computing Machinery, New York, NY, USA, 562–564.
[8]
Ray. 2022. Ray provides a simple, universal API for building distributed applications. ray.io. Retrieved July 12, 2022 from https://github.com/ray-project/ray
[9]
RLlib. 2022. RLlib: Industry-Grade Reinforcement Learning. ray.io. Retrieved July 12, 2022 from https://github.com/ray-project/ray/tree/master/rllib
[10]
Michael Schaarschmidt, Sven Mika, Kai Fricke, and Eiko Yoneki. 2019. RLgraph: Modular Computation Graphs for Deep Reinforcement Learning. Proceedings of Machine Learning and Systems 1 (2019), 65–80.
[11]
Wildlife Studios. 2021. Using Reinforcement Learning to Optimize IAP Offer Recommendations in Mobile Games. wildlifestudios.com. Retrieved July 12, 2022 from https://www.youtube.com/watch?v=cGQk8rIoc1Y
[12]
Qing Wang, Jiechao Xiong, Lei Han, Han Liu, Tong Zhang, 2018. Exponentially weighted imitation learning for batched historical data. Advances in Neural Information Processing Systems 31 (2018), 6288–6297.
[13]
Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott E. Reed, Bobak Shahriari, Noah Y. Siegel, Josh Merel, Çaglar Gülçehre, Nicolas Heess, and Nando de Freitas. 2020. Critic Regularized Regression. CoRR abs/2006.15134(2020), 1–24. arXiv:2006.15134https://arxiv.org/abs/2006.15134

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
RecSys '22: Proceedings of the 16th ACM Conference on Recommender Systems
September 2022
743 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RLlib
  2. Recommender systems
  3. Reinforcement learning

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 408
    Total Downloads
  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)7
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media