Cang et al., 2021 - Google Patents
Behavioral priors and dynamics models: Improving performance and domain transfer in offline rlCang et al., 2021
View PDF- Document ID
- 17015169259498962404
- Author
- Cang C
- Rajeswaran A
- Abbeel P
- Laskin M
- Publication year
- Publication venue
- arXiv preprint arXiv:2106.09119
External Links
Snippet
Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. Extracting policies from diverse offline datasets has the potential to expand the range of applicability of RL by making the …
- 230000003542 behavioural 0 title abstract description 76
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G06N5/046—Forward inferencing, production systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/004—Artificial life, i.e. computers simulating life
- G06N3/006—Artificial life, i.e. computers simulating life based on simulated virtual individual or collective life forms, e.g. single "avatar", social simulations, virtual worlds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30587—Details of specialised database models
- G06F17/30595—Relational databases
- G06F17/30598—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/02—Computer systems based on specific mathematical models using fuzzy logic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/18—Digital computers in general; Data processing equipment in general in which a programme is changed according to experience gained by the computer itself during a complete run; Learning machines
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Cang et al. | Behavioral priors and dynamics models: Improving performance and domain transfer in offline rl | |
Brown et al. | Better-than-demonstrator imitation learning via automatically-ranked demonstrations | |
Song et al. | V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control | |
Amin et al. | A survey of exploration methods in reinforcement learning | |
Alet et al. | Meta-learning curiosity algorithms | |
Polson et al. | Deep learning: A Bayesian perspective | |
Abdolmaleki et al. | Relative entropy regularized policy iteration | |
Plaat et al. | Deep model-based reinforcement learning for high-dimensional problems, a survey | |
Gomez et al. | Accelerated Neural Evolution through Cooperatively Coevolved Synapses. | |
Heylighen et al. | Cybernetics and second-order cybernetics | |
Campos et al. | Beyond fine-tuning: Transferring behavior in reinforcement learning | |
Ghosh et al. | Learning to reach goals without reinforcement learning | |
Kaushik et al. | Multi-objective model-based policy search for data-efficient learning with sparse rewards | |
Lyu et al. | Double check your state before trusting it: Confidence-aware bidirectional offline model-based imagination | |
Cobo et al. | Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains | |
Botteghi et al. | Unsupervised representation learning in deep reinforcement learning: A review | |
Zhang et al. | Efficient experience replay architecture for offline reinforcement learning | |
Kim et al. | Accelerating reinforcement learning with value-conditional state entropy exploration | |
Jhunjhunwala | Policy extraction via online q-value distillation | |
Cerezo et al. | Fractal AI: A fragile theory of intelligence | |
Freire et al. | Sequential episodic control | |
Sun et al. | Constrained mdps can be solved by eearly-termination with recurrent models | |
Guzman et al. | Adaptive model predictive control by learning classifiers | |
Kuric et al. | Meta reinforcement learning for fast adaptation of hierarchical policies | |
KrisshnaKumar et al. | Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach |