Computer Science > Machine Learning

arXiv:2310.07220 (cs)

[Submitted on 11 Oct 2023 (v1), last revised 30 Dec 2023 (this version, v2)]

Title:COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Authors:Xiyao Wang, Ruijie Zheng, Yanchao Sun, Ruonan Jia, Wichayaporn Wongkamjan, Huazhe Xu, Furong Huang

Abstract:Dyna-style model-based reinforcement learning contains two phases: model rollouts to generate sample for policy learning and real environment exploration using current policy for dynamics model learning. However, due to the complex real-world environment, it is inevitable to learn an imperfect dynamics model with model prediction error, which can further mislead policy learning and result in sub-optimal solutions. In this paper, we propose $\texttt{COPlanner}$, a planning-driven framework for model-based methods to address the inaccurately learned dynamics model problem with conservative model rollouts and optimistic environment exploration. $\texttt{COPlanner}$ leverages an uncertainty-aware policy-guided model predictive control (UP-MPC) component to plan for multi-step uncertainty estimation. This estimated uncertainty then serves as a penalty during model rollouts and as a bonus during real environment exploration respectively, to choose actions. Consequently, $\texttt{COPlanner}$ can avoid model uncertain regions through conservative model rollouts, thereby alleviating the influence of model error. Simultaneously, it explores high-reward model uncertain regions to reduce model error actively through optimistic real environment exploration. $\texttt{COPlanner}$ is a plug-and-play framework that can be applied to any dyna-style model-based methods. Experimental results on a series of proprioceptive and visual continuous control tasks demonstrate that both sample efficiency and asymptotic performance of strong model-based methods are significantly improved combined with $\texttt{COPlanner}$.

Comments:	22 pages, 17 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2310.07220 [cs.LG]
	(or arXiv:2310.07220v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2310.07220

Submission history

From: Xiyao Wang [view email]
[v1] Wed, 11 Oct 2023 06:10:07 UTC (21,480 KB)
[v2] Sat, 30 Dec 2023 04:16:38 UTC (28,108 KB)

Computer Science > Machine Learning

Title:COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators