More Web Proxy on the site http://driver.im/

poster

Personalisation via Dynamic Policy Fusion

Authors:

Ajsal Shereef Palattuparambil,

Thommen Karimpanal George,

Santu RanaAuthors Info & Claims

HAI '24: Proceedings of the 12th International Conference on Human-Agent Interaction

Pages 459 - 461

https://doi.org/10.1145/3687272.3690916

Published: 24 November 2024 Publication History

Abstract

Reward-optimal policies obtained by training deep reinforcement learning agents may not be aligned with one’s personal preferences. Rectifying this by retraining the agent with a user-specific reward function is impractical, as such functions are not readily available. In addition, retraining is associated with high costs. Instead, we propose to adapt via policy fusion, the already trained policy with the user’s intent, which we in turn infer from trajectory-level feedback. We design the policy fusion process to be dynamic, such that the resulting policy is neither dominated by the task goals nor the user needs. We empirically demonstrate that our method consistently balances these objectives across various environments.

Supplemental Material

PDF File

Supplementary file

Download
253.14 KB

References

[1]

Jose A Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, and Sepp Hochreiter. 2019. Rudder: Return decomposition for delayed rewards. Advances in Neural Information Processing Systems 32 (2019).

[2]

Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2017. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).

[3]

Conor F Hayes, Roxana Rădulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, 2022. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 26.

Digital Library

[4]

Kimin Lee, Laura M Smith, and Pieter Abbeel. 2021. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In International Conference on Machine Learning. PMLR, 6152–6163.

[5]

Edouard Leurent. 2018. An Environment for Autonomous Driving Decision-Making. https://github.com/eleurent/highway-env.

[6]

Sergey Levine, Zoran Popovic, and Vladlen Koltun. 2011. Nonlinear inverse reinforcement learning with gaussian processes. Advances in neural information processing systems 24 (2011).

[7]

Patrick Mannion, Fredrik Heintz, Thommen George Karimpanal, and Peter Vamplew. 2021. Multi-objective decision making for trustworthy ai. In Proceedings of the Multi-Objective Decision Making (MODeM) Workshop.

[8]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[9]

Sriraam Natarajan and Prasad Tadepalli. 2005. Dynamic preferences in multi-criteria reinforcement learning. In Proceedings of the 22nd international conference on Machine learning. 601–608.

Digital Library

[10]

Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. 2013. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48 (2013), 67–113.

[11]

Jürgen Schmidhuber, Sepp Hochreiter, 1997. Long short-term memory. Neural Comput 9, 8 (1997), 1735–1780.

Digital Library

[12]

Manisha Senadeera, Thommen Karimpanal George, Stephan Jacobs, Sunil Gupta, and Santu Rana. 2024. EMOTE: An Explainable Architecture for Modelling the Other through Empathy. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Kate Larson (Ed.). International Joint Conferences on Artificial Intelligence Organization, 4876–4884. https://doi.org/10.24963/ijcai.2024/539 Main Track.

Digital Library

[13]

Alessandro Sestini, Alexander Kuhnle, and Andrew D Bagdanov. 2021. Policy fusion for adaptive and customizable reinforcement learning agents. In 2021 IEEE Conference on Games (CoG). IEEE, 01–08.

Digital Library

[14]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[15]

Norman Tasfi. 2016. PyGame Learning Environment. https://github.com/ntasfi/PyGame-Learning-Environment.

[16]

Guoxi Zhang and Hisashi Kashima. 2024. Learning state importance for preference-based reinforcement learning. Machine Learning 113, 4 (2024), 1885–1901.

Digital Library

Index Terms

Personalisation via Dynamic Policy Fusion
1. Human-centered computing
  1. Human computer interaction (HCI)

Recommendations

Policy enforcement via program monitoring
Dynamic context-aware personalisation in a pervasive environment

In the development of ubiquitous and pervasive systems, it is understood that mechanisms are required to take adequate account of user preferences. This paper presents several key challenges for personalisation in pervasive environments and introduces ...
Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
PRIMA 2020: Principles and Practice of Multi-Agent Systems
Abstract
We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment changes during execution. This violates the Markov ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HAI '24: Proceedings of the 12th International Conference on Human-Agent Interaction

November 2024

502 pages

ISBN:9798400711787

DOI:10.1145/3687272

Editors:
Muneeb Imtiaz Ahmad
Swansea University, United Kingdom
,
Katrin Lohan
Eastern Switzerland University
,
Mary Ellen Foster
University of Glasgow, Scotland
,
Patrick Holthaus
University of Hertfordshire, United Kingdom
,
Yukie Nagai
The University of Tokyo, Japan

Copyright © 2024 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2024

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

HAI '24

Sponsor:

SIGCHI

HAI '24: International Conference on Human-Agent Interaction

November 24 - 27, 2024

Swansea, United Kingdom

Acceptance Rates

Overall Acceptance Rate 121 of 404 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
13
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)8

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten