Computer Science > Machine Learning

arXiv:2210.06692 (cs)

[Submitted on 13 Oct 2022 (v1), last revised 29 Oct 2022 (this version, v2)]

Title:Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Authors:Kaiyang Guo, Yunfeng Shao, Yanhui Geng

View PDF

Abstract:Model-based offline reinforcement learning (RL) aims to find highly rewarding policy, by leveraging a previously collected static dataset and a dynamics model. While the dynamics model learned through reuse of the static dataset, its generalization ability hopefully promotes policy learning if properly utilized. To that end, several works propose to quantify the uncertainty of predicted dynamics, and explicitly apply it to penalize reward. However, as the dynamics and the reward are intrinsically different factors in context of MDP, characterizing the impact of dynamics uncertainty through reward penalty may incur unexpected tradeoff between model utilization and risk avoidance. In this work, we instead maintain a belief distribution over dynamics, and evaluate/optimize policy through biased sampling from the belief. The sampling procedure, biased towards pessimism, is derived based on an alternating Markov game formulation of offline RL. We formally show that the biased sampling naturally induces an updated dynamics belief with policy-dependent reweighting factor, termed Pessimism-Modulated Dynamics Belief. To improve policy, we devise an iterative regularized policy optimization algorithm for the game, with guarantee of monotonous improvement under certain condition. To make practical, we further devise an offline RL algorithm to approximately find the solution. Empirical results show that the proposed approach achieves state-of-the-art performance on a wide range of benchmark tasks.

Comments:	NeurIPS 2022 (Oral)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2210.06692 [cs.LG]
	(or arXiv:2210.06692v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2210.06692

Submission history

From: Kaiyang Guo [view email]
[v1] Thu, 13 Oct 2022 03:14:36 UTC (1,095 KB)
[v2] Sat, 29 Oct 2022 02:04:16 UTC (1,095 KB)

Computer Science > Machine Learning

Title:Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators