Liu et al., 2020 - Google Patents
Overview of reinforcement learning based on value and policyLiu et al., 2020
- Document ID
- 6085800706183336402
- Author
- Liu Y
- Yang J
- Chen L
- Guo T
- Jiang Y
- Publication year
- Publication venue
- 2020 Chinese Control And Decision Conference (CCDC)
External Links
Snippet
Reinforcement learning methods are mainly divided into two categories based on value functions and policies. This article systematically introduces and summarizes reinforcement learning methods from these two categories. First, it summarizes the reinforcement learning …
- 230000002787 reinforcement 0 title abstract description 39
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G06N3/04—Architectures, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/02—Computer systems based on specific mathematical models using fuzzy logic
- G06N7/023—Learning or tuning the parameters of a fuzzy system
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/02—Knowledge representation
- G06N5/022—Knowledge engineering, knowledge acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B17/00—Systems involving the use of models or simulators of said systems
- G05B17/02—Systems involving the use of models or simulators of said systems electric
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A reinforcement learning based RMOEA/D for bi-objective fuzzy flexible job shop scheduling | |
Liu et al. | Overview of reinforcement learning based on value and policy | |
Russell et al. | Q-decomposition for reinforcement learning agents | |
Dash et al. | Efficient stock price prediction using a self evolving recurrent neuro-fuzzy inference system optimized through a modified differential harmony search technique | |
Touati et al. | Randomized value functions via multiplicative normalizing flows | |
Gasic et al. | Gaussian processes for fast policy optimisation of pomdp-based dialogue managers | |
Rosenbloom | The Sigma cognitive architecture and system | |
Juang et al. | A locally recurrent fuzzy neural network with support vector regression for dynamic-system modeling | |
Zhao et al. | Asynchronous reinforcement learning algorithms for solving discrete space path planning problems | |
CN111309880A (en) | Multi-agent action strategy learning method, device, medium and computing equipment | |
Ergen et al. | Energy-efficient LSTM networks for online learning | |
Barto | Reinforcement learning and dynamic programming | |
Liu et al. | Prioritized experience replay based on multi-armed bandit | |
Dong et al. | A hybrid algorithm for workflow scheduling in cloud environment | |
Hung | A fuzzy GARCH model applied to stock market scenario using a genetic algorithm | |
Byeon | Advances in Value-based, Policy-based, and Deep Learning-based Reinforcement Learning | |
Zhao et al. | Ensemble-based offline-to-online reinforcement learning: From pessimistic learning to optimistic exploration | |
Chen et al. | Boosting the performance of computing systems through adaptive configuration tuning | |
Ghazanfari et al. | Enhancing nash q-learning and team q-learning mechanisms by using bottlenecks | |
Schmitt et al. | Exploration via epistemic value estimation | |
Chen et al. | Averaged-A3C for asynchronous deep reinforcement learning | |
Aydin et al. | Adaptive operator selection utilising generalised experience | |
Tsuchiya et al. | Explainable Reinforcement Learning Based on Q-Value Decomposition by Expected State Transitions. | |
Sun et al. | Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay | |
Tang et al. | Hierarchical reinforcement learning based on multi-agent cooperation game theory |