Vuong et al., 2019 - Google Patents
Uncertainty-aware model-based policy optimizationVuong et al., 2019
View PDF- Document ID
- 14419615351770162475
- Author
- Vuong T
- Tran K
- Publication year
- Publication venue
- arXiv preprint arXiv:1906.10717
External Links
Snippet
Model-based reinforcement learning has the potential to be more sample efficient than model-free approaches. However, existing model-based methods are vulnerable to model bias, which leads to poor generalization and asymptotic performance compared to model …
- 238000005457 optimization 0 title abstract description 19
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Exploring model-based planning with policy networks | |
Bakshy et al. | AE: A domain-agnostic platform for adaptive experimentation | |
Xu et al. | Learning to explore via meta-policy gradient | |
Boney et al. | Regularizing model-based planning with energy-based models | |
Luis et al. | Inductive transfer for learning Bayesian networks | |
Vuong et al. | Uncertainty-aware model-based policy optimization | |
Zhao et al. | Adaptive behavior cloning regularization for stable offline-to-online reinforcement learning | |
Gaier et al. | Data-efficient neuroevolution with kernel-based surrogate models | |
Song et al. | Efficient evaluation methods for neural architecture search: A survey | |
Li et al. | Hyper-parameter estimation method with particle swarm optimization | |
Liu et al. | Type-2 hierarchical fuzzy system for high-dimensional data-based modeling with uncertainties | |
Zhao et al. | Ode-based recurrent model-free reinforcement learning for pomdps | |
Xiao et al. | Nonparametric kernel smoother on topology learning neural networks for incremental and ensemble regression | |
Jawed et al. | Multi-task learning curve forecasting across hyperparameter configurations and datasets | |
Vuong et al. | Policy Optimization In the Face of Uncertainty | |
Oxenstierna | Predicting house prices using ensemble learning with cluster aggregations | |
Li et al. | Policy gradient methods with gaussian process modelling acceleration | |
Gupta et al. | Sequential knowledge transfer across problems | |
Li et al. | Continuous probabilistic model building genetic network programming using reinforcement learning | |
Nilsson et al. | Tree Ensembles for Contextual Bandits | |
Yang et al. | BiES: adaptive policy optimization for model-based offline reinforcement learning | |
Anitha et al. | Deep artificial neural network based multilayer gated recurrent model for effective prediction of software development effort | |
Wulur et al. | Planning-integrated Policy for Efficient Reinforcement Learning in Sparse-reward Environments | |
Li et al. | Bayesian optimization with particle swarm | |
Faury et al. | Rover descent: Learning to optimize by learning to navigate on prototypical loss surfaces |