Zhang et al., 2017 - Google Patents
Weighted double Q-learning.Zhang et al., 2017
View PDF- Document ID
- 7472296724396287444
- Author
- Zhang Z
- Pan Z
- Kochenderfer M
- Publication year
- Publication venue
- IJCAI
External Links
Snippet
Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the …
- 230000002787 reinforcement 0 abstract description 13
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/0275—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using fuzzy logic only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Weighted double Q-learning. | |
Derman et al. | Soft-robust actor-critic policy-gradient | |
Ouyang et al. | Learning-based control of unknown linear systems with thompson sampling | |
CN107479380A (en) | Multi-Agent coordination control method based on evolutionary game theory | |
Khamassi et al. | Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task | |
Singh et al. | Anti-jamming in cognitive radio networks using reinforcement learning algorithms | |
Zhang et al. | Deep reinforcement learning based cooperative partial task offloading and resource allocation for IIoT applications | |
Oentaryo et al. | Online probabilistic learning for fuzzy inference system | |
Mohamed et al. | Multi-objective states of matter search algorithm for TCSC-based smart controller design | |
CN113919217B (en) | Adaptive parameter setting method and device for active disturbance rejection controller | |
Jiang et al. | Action candidate based clipped double q-learning for discrete and continuous action tasks | |
Jiang et al. | Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks | |
Behmanesh et al. | Chaotic time series prediction using improved ANFIS with imperialist competitive learning algorithm | |
Qu et al. | Kernel least mean kurtosis based online chaotic time series prediction | |
Dasgupta et al. | Adaptive computational chemotaxis in bacterial foraging algorithm | |
Li et al. | Soac: The soft option actor-critic architecture | |
Ikemoto et al. | Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems | |
Lenin et al. | Ant colony search algorithm for optimal reactive power optimization | |
Jun et al. | An enhanced online sequential extreme learning machine algorithm | |
Yu et al. | Learning correlated stackelberg equilibrium in general-sum multi-leader-single-follower games | |
Shi et al. | A hybrid immigrants strategy for dynamic multi-objective optimization | |
Maggipinto et al. | Proximal deterministic policy gradient | |
van Hasselt et al. | Convergence of model-based temporal difference learning for control | |
Masadeh et al. | Selector-actor-critic and tuner-actor-critic algorithms for reinforcement learning | |
Jacob et al. | Self-reorganizing TSK fuzzy inference system with BCM theory of meta-plasticity |