[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3687272.3690916acmconferencesArticle/Chapter ViewAbstractPublication PageshaiConference Proceedingsconference-collections
poster

Personalisation via Dynamic Policy Fusion

Published: 24 November 2024 Publication History

Abstract

Reward-optimal policies obtained by training deep reinforcement learning agents may not be aligned with one’s personal preferences. Rectifying this by retraining the agent with a user-specific reward function is impractical, as such functions are not readily available. In addition, retraining is associated with high costs. Instead, we propose to adapt via policy fusion, the already trained policy with the user’s intent, which we in turn infer from trajectory-level feedback. We design the policy fusion process to be dynamic, such that the resulting policy is neither dominated by the task goals nor the user needs. We empirically demonstrate that our method consistently balances these objectives across various environments.

Supplemental Material

PDF File
Supplementary file

References

[1]
Jose A Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, and Sepp Hochreiter. 2019. Rudder: Return decomposition for delayed rewards. Advances in Neural Information Processing Systems 32 (2019).
[2]
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
[3]
Conor F Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, 2022. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 26.
[4]
Kimin Lee, Laura M Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In International Conference on Machine Learning. PMLR, 6152–6163.
[5]
Edouard Leurent. 2018. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env.
[6]
Sergey Levine, Zoran Popovic, and Vladlen Koltun. 2011. Nonlinear inverse reinforcement learning with gaussian processes. Advances in neural information processing systems 24 (2011).
[7]
Patrick Mannion, Fredrik Heintz, Thommen George Karimpanal, and Peter Vamplew. 2021. Multi-objective decision making for trustworthy ai. In Proceedings of the Multi-Objective Decision Making (MODeM) Workshop.
[8]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).
[9]
Sriraam Natarajan and Prasad Tadepalli. 2005. Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the 22nd international conference on Machine learning. 601–608.
[10]
Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. 2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48 (2013), 67–113.
[11]
Jürgen Schmidhuber, Sepp Hochreiter, 1997. Long short-term memory. Neural Comput 9, 8 (1997), 1735–1780.
[12]
Manisha Senadeera, Thommen Karimpanal George, Stephan Jacobs, Sunil Gupta, and Santu Rana. 2024. EMOTE: An Explainable Architecture for Modelling the Other through Empathy. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Kate Larson (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4876–4884. https://doi.org/10.24963/ijcai.2024/539 Main Track.
[13]
Alessandro Sestini, Alexander Kuhnle, and Andrew D Bagdanov. 2021. Policy fusion for adaptive and customizable reinforcement learning agents. In 2021 IEEE Conference on Games (CoG). IEEE, 01–08.
[14]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.
[15]
Norman Tasfi. 2016. PyGame Learning Environment. https://github.com/ntasfi/PyGame-Learning-Environment.
[16]
Guoxi Zhang and Hisashi Kashima. 2024. Learning state importance for preference-based reinforcement learning. Machine Learning 113, 4 (2024), 1885–1901.

Index Terms

  1. Personalisation via Dynamic Policy Fusion

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HAI '24: Proceedings of the 12th International Conference on Human-Agent Interaction
    November 2024
    502 pages
    ISBN:9798400711787
    DOI:10.1145/3687272
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 November 2024

    Check for updates

    Author Tags

    1. Dynamic Policy Fusion
    2. Personalisation
    3. Reinforcement Learning

    Qualifiers

    • Poster
    • Research
    • Refereed limited

    Conference

    HAI '24
    Sponsor:
    HAI '24: International Conference on Human-Agent Interaction
    November 24 - 27, 2024
    Swansea, United Kingdom

    Acceptance Rates

    Overall Acceptance Rate 121 of 404 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 5
      Total Downloads
    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 15 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media