[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Zhang et al., 2017 - Google Patents

Weighted double Q-learning.

Zhang et al., 2017

View PDF
Document ID
7472296724396287444
Author
Zhang Z
Pan Z
Kochenderfer M
Publication year
Publication venue
IJCAI

External Links

Snippet

Q-learning is a popular reinforcement learning algorithm, but it can perform poorly in stochastic environments due to overestimating action values. Overestimation is due to the use of a single estimator that uses the maximum action value as an approximation for the …
Continue reading at www.ijcai.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/0275Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using fuzzy logic only
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system

Similar Documents

Publication Publication Date Title
Zhang et al. Weighted double Q-learning.
Derman et al. Soft-robust actor-critic policy-gradient
Ouyang et al. Learning-based control of unknown linear systems with thompson sampling
CN107479380A (en) Multi-Agent coordination control method based on evolutionary game theory
Khamassi et al. Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task
Singh et al. Anti-jamming in cognitive radio networks using reinforcement learning algorithms
Zhang et al. Deep reinforcement learning based cooperative partial task offloading and resource allocation for IIoT applications
Oentaryo et al. Online probabilistic learning for fuzzy inference system
Mohamed et al. Multi-objective states of matter search algorithm for TCSC-based smart controller design
CN113919217B (en) Adaptive parameter setting method and device for active disturbance rejection controller
Jiang et al. Action candidate based clipped double q-learning for discrete and continuous action tasks
Jiang et al. Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks
Behmanesh et al. Chaotic time series prediction using improved ANFIS with imperialist competitive learning algorithm
Qu et al. Kernel least mean kurtosis based online chaotic time series prediction
Dasgupta et al. Adaptive computational chemotaxis in bacterial foraging algorithm
Li et al. Soac: The soft option actor-critic architecture
Ikemoto et al. Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems
Lenin et al. Ant colony search algorithm for optimal reactive power optimization
Jun et al. An enhanced online sequential extreme learning machine algorithm
Yu et al. Learning correlated stackelberg equilibrium in general-sum multi-leader-single-follower games
Shi et al. A hybrid immigrants strategy for dynamic multi-objective optimization
Maggipinto et al. Proximal deterministic policy gradient
van Hasselt et al. Convergence of model-based temporal difference learning for control
Masadeh et al. Selector-actor-critic and tuner-actor-critic algorithms for reinforcement learning
Jacob et al. Self-reorganizing TSK fuzzy inference system with BCM theory of meta-plasticity