Liu et al., 2020 - Google Patents

Overview of reinforcement learning based on value and policy

Liu et al., 2020

Document ID: 6085800706183336402
Author: Liu Y; Yang J; Chen L; Guo T; Jiang Y
Publication year: 2020
Publication venue: 2020 Chinese Control And Decision Conference (CCDC)

External Links

Cited by

Snippet

Reinforcement learning methods are mainly divided into two categories based on value functions and policies. This article systematically introduces and summarizes reinforcement learning methods from these two categories. First, it summarizes the reinforcement learning …

Continue reading at ieeexplore.ieee.org (other versions)

230000002787 reinforcement 0 title abstract description 39

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/02—Computer systems based on specific mathematical models using fuzzy logic
- G06N7/023—Learning or tuning the parameters of a fuzzy system
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B17/00—Systems involving the use of models or simulators of said systems
- G05B17/02—Systems involving the use of models or simulators of said systems electric

Similar Documents

Publication	Publication Date	Title
Li et al.	2022	A reinforcement learning based RMOEA/D for bi-objective fuzzy flexible job shop scheduling
Liu et al.	2020	Overview of reinforcement learning based on value and policy
Russell et al.	2003	Q-decomposition for reinforcement learning agents
Dash et al.	2016	Efficient stock price prediction using a self evolving recurrent neuro-fuzzy inference system optimized through a modified differential harmony search technique
Touati et al.	2020	Randomized value functions via multiplicative normalizing flows
Gasic et al.	2010	Gaussian processes for fast policy optimisation of pomdp-based dialogue managers
Rosenbloom	2013	The Sigma cognitive architecture and system
Juang et al.	2010	A locally recurrent fuzzy neural network with support vector regression for dynamic-system modeling
Zhao et al.	2018	Asynchronous reinforcement learning algorithms for solving discrete space path planning problems
CN111309880A (en)	2020-06-19	Multi-agent action strategy learning method, device, medium and computing equipment
Ergen et al.	2019	Energy-efficient LSTM networks for online learning
Barto	1995	Reinforcement learning and dynamic programming
Liu et al.	2022	Prioritized experience replay based on multi-armed bandit
Dong et al.	2023	A hybrid algorithm for workflow scheduling in cloud environment
Hung	2009	A fuzzy GARCH model applied to stock market scenario using a genetic algorithm
Byeon	2023	Advances in Value-based, Policy-based, and Deep Learning-based Reinforcement Learning
Zhao et al.	2023	Ensemble-based offline-to-online reinforcement learning: From pessimistic learning to optimistic exploration
Chen et al.	2009	Boosting the performance of computing systems through adaptive configuration tuning
Ghazanfari et al.	2014	Enhancing nash q-learning and team q-learning mechanisms by using bottlenecks
Schmitt et al.	2023	Exploration via epistemic value estimation
Chen et al.	2018	Averaged-A3C for asynchronous deep reinforcement learning
Aydin et al.	2023	Adaptive operator selection utilising generalised experience
Tsuchiya et al.	2023	Explainable Reinforcement Learning Based on Q-Value Decomposition by Expected State Transitions.
Sun et al.	2022	Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay
Tang et al.	2019	Hierarchical reinforcement learning based on multi-agent cooperation game theory