Zhang et al., 2017 - Google Patents

Weighted double Q-learning.

Zhang et al., 2017

Document ID: 7472296724396287444
Author: Zhang Z; Pan Z; Kochenderfer M
Publication year: 2017
Publication venue: IJCAI

External Links

Cited by

Snippet

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the …

Continue reading at www.ijcai.org (PDF) (other versions)

230000002787 reinforcement 0 abstract description 13

Classifications

- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/0275—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using fuzzy logic only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2017	Weighted double Q-learning.
Derman et al.	2018	Soft-robust actor-critic policy-gradient
Ouyang et al.	2017	Learning-based control of unknown linear systems with thompson sampling
CN107479380A (en)	2017-12-15	Multi-Agent coordination control method based on evolutionary game theory
Khamassi et al.	2017	Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task
Singh et al.	2012	Anti-jamming in cognitive radio networks using reinforcement learning algorithms
Zhang et al.	2022	Deep reinforcement learning based cooperative partial task offloading and resource allocation for IIoT applications
Oentaryo et al.	2014	Online probabilistic learning for fuzzy inference system
Mohamed et al.	2016	Multi-objective states of matter search algorithm for TCSC-based smart controller design
CN113919217B (en)	2024-05-17	Adaptive parameter setting method and device for active disturbance rejection controller
Jiang et al.	2021	Action candidate based clipped double q-learning for discrete and continuous action tasks
Jiang et al.	2022	Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks
Behmanesh et al.	2014	Chaotic time series prediction using improved ANFIS with imperialist competitive learning algorithm
Qu et al.	2013	Kernel least mean kurtosis based online chaotic time series prediction
Dasgupta et al.	2008	Adaptive computational chemotaxis in bacterial foraging algorithm
Li et al.	2020	Soac: The soft option actor-critic architecture
Ikemoto et al.	2021	Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems
Lenin et al.	2006	Ant colony search algorithm for optimal reactive power optimization
Jun et al.	2008	An enhanced online sequential extreme learning machine algorithm
Yu et al.	2022	Learning correlated stackelberg equilibrium in general-sum multi-leader-single-follower games
Shi et al.	2018	A hybrid immigrants strategy for dynamic multi-objective optimization
Maggipinto et al.	2020	Proximal deterministic policy gradient
van Hasselt et al.	2007	Convergence of model-based temporal difference learning for control
Masadeh et al.	2019	Selector-actor-critic and tuner-actor-critic algorithms for reinforcement learning
Jacob et al.	2012	Self-reorganizing TSK fuzzy inference system with BCM theory of meta-plasticity