Zhao et al., 2022 - Google Patents

Adaptive behavior cloning regularization for stable offline-to-online reinforcement learning

Zhao et al., 2022

Document ID: 6985242959602302250
Author: Zhao Y; Boney R; Ilin A; Kannala J; Pajarinen J
Publication year: 2022
Publication venue: arXiv preprint arXiv:2210.13846

External Links

Cited by

Snippet

Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. However, depending on the quality of the offline dataset, such pre-trained agents may have limited performance and …

Continue reading at arxiv.org (PDF) (other versions)

230000006399 behavior 0 title abstract description 40

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition

Similar Documents

Publication	Publication Date	Title
Zhao et al.	2022	Adaptive behavior cloning regularization for stable offline-to-online reinforcement learning
Lee et al.	2022	Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble
Ajay et al.	2022	Is conditional generative modeling all you need for decision-making?
Chen et al.	2021	Delay-aware model-based reinforcement learning for continuous control
Vecerik et al.	2017	Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards
Kurenkov et al.	2019	Ac-teach: A bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers
Chen et al.	2022	Latent-variable advantage-weighted policy optimization for offline rl
Shrestha et al.	2020	Deepaveragers: offline reinforcement learning by solving derived non-parametric mdps
Ma et al.	2022	Offline goal-conditioned reinforcement learning via $ f $-advantage regression
Cang et al.	2021	Behavioral priors and dynamics models: Improving performance and domain transfer in offline rl
Choi et al.	2021	Variational empowerment as representation learning for goal-based reinforcement learning
Li et al.	2020	ACDER: Augmented curiosity-driven experience replay
Hein et al.	2018	Generating interpretable fuzzy controllers using particle swarm optimization and genetic programming
Zhang et al.	2023	Efficient experience replay architecture for offline reinforcement learning
ElDahshan et al.	2022	Deep reinforcement learning based video games: A review
Vuong et al.	2019	Uncertainty-aware model-based policy optimization
Zhao et al.	2023	Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Zhao et al.	2023	Ensemble-based offline-to-online reinforcement learning: From pessimistic learning to optimistic exploration
Coelho et al.	2024	VQC-based reinforcement learning with data re-uploading: performance and trainability
Yang et al.	2020	Continuous control for searching and planning with a learned model
Ma et al.	2023	Learning to coordinate from offline datasets with uncoordinated behavior policies
Hepburn et al.	2024	Model-based trajectory stitching for improved behavioural cloning and its applications
Lee et al.	2020	Addressing distribution shift in online reinforcement learning with offline datasets
Li et al.	2023	Offline Reinforcement Learning with Uncertainty Critic Regularization Based on Density Estimation
Liu et al.	2024	Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning