Zhao et al., 2022 - Google Patents
Adaptive behavior cloning regularization for stable offline-to-online reinforcement learningZhao et al., 2022
View PDF- Document ID
- 6985242959602302250
- Author
- Zhao Y
- Boney R
- Ilin A
- Kannala J
- Pajarinen J
- Publication year
- Publication venue
- arXiv preprint arXiv:2210.13846
External Links
Snippet
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. However, depending on the quality of the offline dataset, such pre-trained agents may have limited performance and …
- 230000006399 behavior 0 title abstract description 40
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | Adaptive behavior cloning regularization for stable offline-to-online reinforcement learning | |
Lee et al. | Offline-to-online reinforcement learning via balanced replay and pessimistic q-ensemble | |
Ajay et al. | Is conditional generative modeling all you need for decision-making? | |
Chen et al. | Delay-aware model-based reinforcement learning for continuous control | |
Vecerik et al. | Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards | |
Kurenkov et al. | Ac-teach: A bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers | |
Chen et al. | Latent-variable advantage-weighted policy optimization for offline rl | |
Shrestha et al. | Deepaveragers: offline reinforcement learning by solving derived non-parametric mdps | |
Ma et al. | Offline goal-conditioned reinforcement learning via $ f $-advantage regression | |
Cang et al. | Behavioral priors and dynamics models: Improving performance and domain transfer in offline rl | |
Choi et al. | Variational empowerment as representation learning for goal-based reinforcement learning | |
Li et al. | ACDER: Augmented curiosity-driven experience replay | |
Hein et al. | Generating interpretable fuzzy controllers using particle swarm optimization and genetic programming | |
Zhang et al. | Efficient experience replay architecture for offline reinforcement learning | |
ElDahshan et al. | Deep reinforcement learning based video games: A review | |
Vuong et al. | Uncertainty-aware model-based policy optimization | |
Zhao et al. | Improving Offline-to-Online Reinforcement Learning with Q-Ensembles | |
Zhao et al. | Ensemble-based offline-to-online reinforcement learning: From pessimistic learning to optimistic exploration | |
Coelho et al. | VQC-based reinforcement learning with data re-uploading: performance and trainability | |
Yang et al. | Continuous control for searching and planning with a learned model | |
Ma et al. | Learning to coordinate from offline datasets with uncoordinated behavior policies | |
Hepburn et al. | Model-based trajectory stitching for improved behavioural cloning and its applications | |
Lee et al. | Addressing distribution shift in online reinforcement learning with offline datasets | |
Li et al. | Offline Reinforcement Learning with Uncertainty Critic Regularization Based on Density Estimation | |
Liu et al. | Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning |