Computer Science > Machine Learning

arXiv:2310.19007 (cs)

[Submitted on 29 Oct 2023 (v1), last revised 31 Oct 2023 (this version, v2)]

Title:Behavior Alignment via Reward Function Optimization

Authors:Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva

View PDF

Abstract:Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outcomes and promote behaviors that are not aligned with the designer's intended goal. Although potential-based reward shaping is often suggested as a remedy, we systematically investigate settings where deploying it often significantly impairs performance. To address these issues, we introduce a new framework that uses a bi-level objective to learn \emph{behavior alignment reward functions}. These functions integrate auxiliary rewards reflecting a designer's heuristics and domain knowledge with the environment's primary rewards. Our approach automatically determines the most effective way to blend these types of feedback, thereby enhancing robustness against heuristic reward misspecification. Remarkably, it can also adapt an agent's policy optimization process to mitigate suboptimalities resulting from limitations and biases inherent in the underlying RL algorithms. We evaluate our method's efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. We investigate heuristic auxiliary rewards of varying quality -- some of which are beneficial and others detrimental to the learning process. Our results show that our framework offers a robust and principled way to integrate designer-specified heuristics. It not only addresses key shortcomings of existing approaches but also consistently leads to high-performing solutions, even when given misaligned or poorly-specified auxiliary reward functions.

Comments:	(Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.19007 [cs.LG]
	(or arXiv:2310.19007v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.19007

Submission history

From: Dhawal Gupta [view email]
[v1] Sun, 29 Oct 2023 13:45:07 UTC (39,587 KB)
[v2] Tue, 31 Oct 2023 04:58:20 UTC (39,544 KB)

Computer Science > Machine Learning

Title:Behavior Alignment via Reward Function Optimization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Behavior Alignment via Reward Function Optimization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators