Abstract
The idea of optimization can be regarded as an important basis of many disciplines and hence is extremely useful for a large number of research fields, particularly for artificial-intelligence-based advanced control design. Due to the difficulty of solving optimal control problems for general nonlinear systems, it is necessary to establish a kind of novel learning strategies with intelligent components. Besides, the rapid development of computer and networked techniques promotes the research on optimal control within discrete-time domain. In this paper, the bases, the derivation, and recent progresses of critic intelligence for discrete-time advanced optimal control design are presented with an emphasis on the iterative framework. Among them, the so-called critic intelligence methodology is highlighted, which integrates learning approximators and the reinforcement formulation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans Syst, Man, Cybern-Part B: Cybern 38(4):943–949
Alex J, Benedetti L, Copp J, Gernaey KV, Jeppsson U, Nopens I, Pons MN, Rieger L, Rosen C, Steyer JP, Vanrolleghem P, Winkler S (2008) Benchmark Simulation Model no. 1 (BSM1), IWA Task Group on Benchmarking of Control Strategies for WWTPs, London
Beard RW, Saridis GN, Wen JT (1997) Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 33(12):2159–2177
Bellman RE (1957) Dyn Progr. Princeton University Press, Princeton, New Jersey
Bertsekas DP (2017) Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 28(3):500–509
Bertsekas DP (2019) Feature-based aggregation and deep reinforcement learning: A survey and some new implementations. IEEE/CAA J Autom Sinica 6(1):1–31
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, Massachusetts
Bian T, Jiang ZP (2016) Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica 71:348–360
Dierks T, Thumati BT, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860
Dong L, Zhong X, Sun C, He H (2017) Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 28(7):1594–1605
Fan QY, Wang D, Xu B (2021) \(H_{\infty }\) codesign for uncertain nonlinear control systems based on policy iteration method. IEEE Trans Cybern (in press)
Fan QY, Yang GH (2016) Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans Neural Netw Learn Syst 27(1):165–177
Gao W, Jiang ZP (2016) Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Trans Autom Control 61(12):4164–4169
Gao W, Jiang ZP (2019) Adaptive optimal output regulation of time-delay systems via measurement feedback. IEEE Trans Neural Netw Learn Syst 30(3):938–945
Ha M, Wang D, Liu D (2020) Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems. IEEE Trans Syst, Man Cybern: Syst 50(9):3158–3168
Ha M, Wang D, Liu D (2021) Generalized value iteration for discounted optimal control with stability analysis. Syst Control Lett 147(104847):1–7
Ha M, Wang D, Liu D (2021) Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Netw 144:176–186
Ha M, Wang D, Liu D (2022) Offline and online adaptive critic control designs with stability guarantee through value iteration. IEEE Trans Cybern (in press)
Han H, Wu X, Qiao J (2019) A self-organizing sliding-mode controller for wastewater treatment processes. IEEE Trans Control Syst Technol 27(4):1480–1491
Han X, Zhao X, Karimi HR, Wang D, Zong G (2021) Adaptive optimal control for unknown constrained nonlinear systems with a novel quasi-model network. IEEE Trans N Netw Learn Syst (in press)
Haykin S (2009) Neural Netw Learn Mach, 3rd edn. Pearson Prentice Hall, Upper Saddle River, New Jersey
He H, Ni Z, Fu J (2012) A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78:3–13
He H, Zhong X (2018) Learning without external reward. IEEE Comput Intell Mag 13(3):48–54
Heydari A (2014) Revisiting approximate dynamic programming and its convergence. IEEE Trans Cybern 44(12):2733–2743
Jiang H, Zhang H (2018) Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev 50(1):75–91
Jiang Y, Jiang ZP (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929
Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL (2018) Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans Neural Netw Learn Syst 29(6):2042–2062
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Lewis FL, Liu D (2013) Reinforcement learning and approximate dynamic programming for feedback control. John Wiley & Sons, New Jersey
Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105
Li C, Ding J, Lewis FL, Chai T (2021) A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica 129(109687):1–9
Li H, Liu D, Wang D (2018) Manifold regularized reinforcement learning. IEEE Trans Neural Netw Learn Syst 29(4):932–943
Liang M, Wang D, Liu D (2020) Improved value iteration for neural-network-based stochastic optimal control design. Neural Netw 124:280–295
Liang M, Wang D, Liu D (2020) Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Trans Syst, Man Cybern: Syst 50(11):3972–3985
Lincoln B, Rantzer A (2006) Relaxing dynamic programming. IEEE Trans Autom Control 51:1249–1260
Liu D, Li H, Wang D (2013) Data-based self-learning optimal control: Research progress and prospects. Acta Automatica Sinica 39(11):1858–1870
Liu D, Li H, Wang D (2015) Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE Trans Neural Netw Learn Syst 26(6):1323–1334
Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
Liu D, Wei Q, Wang D, Yang X, Li H (2017) Adaptive dynamic programming with applications in optimal control. Springer, London
Liu D, Xu Y, Wei Q, Liu X (2018) Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming. IEEE/CAA J Automatica Sinica 5(1):36–46
Liu D, Xue S, Zhao B, Luo B, Wei Q (2021) Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans Syst, Man, Cybern: Syst 51(1):142–160
Luo B, Yang Y, Liu D (2021) Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Trans Cybern 51(7):3630–3640
Luo B, Yang Y, Liu D, Wu HN (2020) Event-triggered optimal control with performance guarantees using adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 31(1):76–88
Luo B, Yang Y, Wu HN, Huang T (2020) Balancing value iteration and policy iteration for discrete-time control. IEEE Trans Syst, Man, Cybern: Syst 50(11):3948–3958
Modares H, Lewis FL (2014) Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Control 59(11):3051–3056
Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792
Mu C, Wang D (2017) Neural-network-based adaptive guaranteed cost control of nonlinear dynamical systems with matched uncertainties. Neurocomputing 245:46–54
Mu C, Wang D, He H (2018) Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Trans Cybern 48(10):2948–2961
Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst, Man, Cybern-Part C: Appl Rev 32(2):140–153
Na J, Lv Y, Zhang K, Zhao J (2021) Adaptive identifier-critic based optimal tracking control for nonlinear systems with experimental validation. IEEE Trans Syst, Man Cybern ((in press))
Pang B, Jiang ZP (2021) Adaptive optimal control of linear periodic systems: An off-policy value iteration approach. IEEE Trans Autom Control 66(2):888–894
Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007
Si J, Barto AG, Powell WB, Wunsch DC (2004) Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, New Jersey
Si J, Wang YT (2001) On-line learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489
Song R, Lewis FL, Wei Q, Zhang H (2016) Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans Cybern 46(5):1041–1050
Song R, Wei Q, Zhang H, Lewis FL (2021) Discrete-time non-zero-sum games with completely unknown dynamics. IEEE Trans Cybern 51(6):2929–2943
Sutton RS, Barto AG (2018) Reinforcement learning: An introduction, 2nd edn. The MIT Press, Cambridge, Massachusetts
Vamvoudakis KG (2017) Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. Syst Control Lett 100:14–20
Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888
Vrabie D, Vamvoudakis KG, Lewis FL (2013) Optimal adaptive control and differential games by reinforcement learning principles. IET, London
Wang D, Ha M, Cheng L (2022) Neuro-optimal trajectory tracking with value iteration of discrete-time nonlinear dynamics. IEEE Trans N Netw Learn Syst (in press)
Wang D, Ha M, Qiao J (2020) Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans Autom Control 65(3):1272–1279
Wang D, Ha M, Qiao J (2021) Data-driven iterative adaptive critic control towards an urban wastewater treatment plant. IEEE Trans Indus Electron 68(8):7362–7369
Wang D, Ha M, Qiao J, Yan J, Xie Y (2020) Data-based composite control design with critic intelligence for a wastewater treatment platform. Artif Intell Re 53(5):3773–3785
Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: A survey. IEEE Trans Cybern 47(10):3429–3451
Wang D, Qiao J (2019) Approximate neural optimal control with reinforcement learning for a torsional pendulum device. Neural Netw 117:1–7
Wang D, Qiao J, Cheng L (2021) An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans Cybern (in press)
Wang D, Liu D (2018) Learning and guaranteed cost control with event-based adaptive critic implementation. IEEE Trans Neural Netw Learn Syst 29(12):6004–6014
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832
Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst, Man, Cybern: Syst 46(11):1544–1555
Wang D, Zhao M, Ha M, Ren J (2021) Neural optimal tracking control of constrained nonaffine systems with a wastewater treatment application. Neural Netw 143:121–132
Wang D, Zhao M, Qiao J (2021) Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control 31(14):6773–6787
Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\varepsilon \)-error bound. IEEE Trans Neural Netw 22(1):24–36
Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47
Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879
Wei Q, Song R, Liao Z, Li B, Lewis FL (2020) Discrete-time impulsive adaptive dynamic programming. IEEE Trans Cybern 50(10):4293–4306
Wei Q, Wang L, Lu J, Wang FY (2021) Discrete-time self-learning parallel control. IEEE Trans Syst, Man, Cybern: Syst (in press)
Werbos PJ (1974) Beyond regression: New tools for prediction and analysis in the behavioural sciences. Ph.D. dissertation, Harvard University
Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. General Syst Yearbook 22:25–38
Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. Handbook of intelligent control: neural, fuzzy and adaptive approaches 493–526
Werbos PJ (2008) ADP: The key direction for future research in intelligent control and understanding brain intelligence. IEEE Trans Syst, Man, Cybern-Part B: Cybern 38(4):898–900
Xue S, Luo B, Liu D, Gao Y (2022) Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans Cybern (in press)
Xue S, Luo B, Liu D, Yang Y (2021) Constrained event-triggered \(H_{\infty }\) control based on adaptive dynamic programming with concurrent learning. IEEE Trans Syst, Man, Cybern: Syst (in press)
Yan J, He H, Zhong X, Tang Y (2017) Q-learning-based vulnerability analysis of smart grid against sequential topology attacks. IEEE Trans Inf Foren Secur 12(1):200–210
Yang X, Zeng Z, Gao Z (2022) Decentralized neuro-controller design with critic learning for nonlinear-interconnected systems. IEEE Trans Cybern (in press)
Yang X, He H (2021) Event-driven \(H_{\infty }\)-constrained control using adaptive critic learning. IEEE Trans Cybern 51(10):4860–4872
Yang X, He H, Zhong X (2021) Approximate dynamic programming for nonlinear-constrained optimizations. IEEE Trans Cybern 51(5):2419–2432
Yang Y, Gao W, Modares H, Xu CZ (2021) Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Trans Fuzzy Syst (in press)
Yang Y, Vamvoudakis K G, Modares H, Yin Y, Wunsch D C (2021). Hamiltonian-driven hybrid adaptive dynamic programming. IEEE Trans Syst, Man, Cybern: Syst 51(10):6423–6434
Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control: algorithms and stability. Springer, London
Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503
Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_{\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718
Zhang H, Zhang X, Luo Y, Yang J (2013) An overview of research on adaptive dynamic programming. Acta Automatica Sinica 39(4):303–311
Zhang Q, Zhao D, Wang D (2018) Event-based robust control for uncertain nonlinear systems using adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 29(1):37–50
Zhang Q, Zhao D, Zhu Y (2017) Event-triggered \(H_{\infty }\) control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst, Man, Cybern: Syst 47(7):1071–1081
Zhao B, Liu D (2020) Event-triggered decentralized tracking control of modular reconfigurable robots through adaptive dynamic programming. IEEE Trans Indus Electr 67(4):3054–3064
Zhao B, Wang D, Shi G, Liu D, Li Y (2018) Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Trans Syst, Man, Cybern: Syst 48(10):1725–1735
Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865
Zhao Q, Xu H, Jagannathan S (2015) Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(3):486–499
Zhong X, He H, Wang D, Ni Z (2018) Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Trans Cybern 48(5):1633–1646
Zhong X, Ni Z, He H (2016) A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans Neural Netw Learn Syst 27(12):2513–2525
Zhu Y, Zhao D (2018) Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 49(4):531–547
Zhu Y, Zhao D (2021) Online minimax Q network learning for two-player zero-sum Markov games. IEEE Trans Neural Netw Learn Syst (in press)
Zhu Y, Zhao D, Li X, Wang D (2019) Control-limited adaptive dynamic programming for multi-battery energy storage systems. IEEE Trans Smart Grid 10(4):4235–4244
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by Beijing Natural Science Foundation under Grant JQ19013, in part by the National Natural Science Foundation of China under Grant 61773373, Grant 61890930-5, and Grant 62021003, and in part by the National Key Research and Development Project under Grant 2021ZD0112300-2 and Grant 2018YFC1900800-5. No conflict of interest exits in this manuscript and it has been approved by all authors for publication.
Rights and permissions
About this article
Cite this article
Wang, D., Ha, M. & Zhao, M. The intelligent critic framework for advanced optimal control. Artif Intell Rev 55, 1–22 (2022). https://doi.org/10.1007/s10462-021-10118-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-021-10118-9