Skip to main content
Log in

The intelligent critic framework for advanced optimal control

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

The idea of optimization can be regarded as an important basis of many disciplines and hence is extremely useful for a large number of research fields, particularly for artificial-intelligence-based advanced control design. Due to the difficulty of solving optimal control problems for general nonlinear systems, it is necessary to establish a kind of novel learning strategies with intelligent components. Besides, the rapid development of computer and networked techniques promotes the research on optimal control within discrete-time domain. In this paper, the bases, the derivation, and recent progresses of critic intelligence for discrete-time advanced optimal control design are presented with an emphasis on the iterative framework. Among them, the so-called critic intelligence methodology is highlighted, which integrates learning approximators and the reinforcement formulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791

    Article  MathSciNet  MATH  Google Scholar 

  • Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans Syst, Man, Cybern-Part B: Cybern 38(4):943–949

    Article  Google Scholar 

  • Alex J, Benedetti L, Copp J, Gernaey KV, Jeppsson U, Nopens I, Pons MN, Rieger L, Rosen C, Steyer JP, Vanrolleghem P, Winkler S (2008) Benchmark Simulation Model no. 1 (BSM1), IWA Task Group on Benchmarking of Control Strategies for WWTPs, London

  • Beard RW, Saridis GN, Wen JT (1997) Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 33(12):2159–2177

    Article  MathSciNet  MATH  Google Scholar 

  • Bellman RE (1957) Dyn Progr. Princeton University Press, Princeton, New Jersey

    Google Scholar 

  • Bertsekas DP (2017) Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 28(3):500–509

    Article  MathSciNet  Google Scholar 

  • Bertsekas DP (2019) Feature-based aggregation and deep reinforcement learning: A survey and some new implementations. IEEE/CAA J Autom Sinica 6(1):1–31

    Article  MathSciNet  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, Massachusetts

    MATH  Google Scholar 

  • Bian T, Jiang ZP (2016) Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica 71:348–360

    Article  MathSciNet  MATH  Google Scholar 

  • Dierks T, Thumati BT, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860

    Article  MATH  Google Scholar 

  • Dong L, Zhong X, Sun C, He H (2017) Adaptive event-triggered control based on heuristic dynamic programming for nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 28(7):1594–1605

    Article  MathSciNet  Google Scholar 

  • Fan QY, Wang D, Xu B (2021) \(H_{\infty }\) codesign for uncertain nonlinear control systems based on policy iteration method. IEEE Trans Cybern (in press)

  • Fan QY, Yang GH (2016) Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans Neural Netw Learn Syst 27(1):165–177

    Article  MathSciNet  Google Scholar 

  • Gao W, Jiang ZP (2016) Adaptive dynamic programming and adaptive optimal output regulation of linear systems. IEEE Trans Autom Control 61(12):4164–4169

    Article  MathSciNet  MATH  Google Scholar 

  • Gao W, Jiang ZP (2019) Adaptive optimal output regulation of time-delay systems via measurement feedback. IEEE Trans Neural Netw Learn Syst 30(3):938–945

    Article  MathSciNet  Google Scholar 

  • Ha M, Wang D, Liu D (2020) Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems. IEEE Trans Syst, Man Cybern: Syst 50(9):3158–3168

    Article  Google Scholar 

  • Ha M, Wang D, Liu D (2021) Generalized value iteration for discounted optimal control with stability analysis. Syst Control Lett 147(104847):1–7

    MathSciNet  MATH  Google Scholar 

  • Ha M, Wang D, Liu D (2021) Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Netw 144:176–186

    Article  Google Scholar 

  • Ha M, Wang D, Liu D (2022) Offline and online adaptive critic control designs with stability guarantee through value iteration. IEEE Trans Cybern (in press)

  • Han H, Wu X, Qiao J (2019) A self-organizing sliding-mode controller for wastewater treatment processes. IEEE Trans Control Syst Technol 27(4):1480–1491

    Article  Google Scholar 

  • Han X, Zhao X, Karimi HR, Wang D, Zong G (2021) Adaptive optimal control for unknown constrained nonlinear systems with a novel quasi-model network. IEEE Trans N Netw Learn Syst (in press)

  • Haykin S (2009) Neural Netw Learn Mach, 3rd edn. Pearson Prentice Hall, Upper Saddle River, New Jersey

    Google Scholar 

  • He H, Ni Z, Fu J (2012) A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78:3–13

    Article  Google Scholar 

  • He H, Zhong X (2018) Learning without external reward. IEEE Comput Intell Mag 13(3):48–54

    Article  Google Scholar 

  • Heydari A (2014) Revisiting approximate dynamic programming and its convergence. IEEE Trans Cybern 44(12):2733–2743

    Article  Google Scholar 

  • Jiang H, Zhang H (2018) Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev 50(1):75–91

    Article  Google Scholar 

  • Jiang Y, Jiang ZP (2015) Global adaptive dynamic programming for continuous-time nonlinear systems. IEEE Trans Autom Control 60(11):2917–2929

    Article  MathSciNet  MATH  Google Scholar 

  • Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL (2018) Optimal and autonomous control using reinforcement learning: A survey. IEEE Trans Neural Netw Learn Syst 29(6):2042–2062

    Article  MathSciNet  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    Article  Google Scholar 

  • Lewis FL, Liu D (2013) Reinforcement learning and approximate dynamic programming for feedback control. John Wiley & Sons, New Jersey

    Google Scholar 

  • Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105

    Article  MathSciNet  MATH  Google Scholar 

  • Li C, Ding J, Lewis FL, Chai T (2021) A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica 129(109687):1–9

    MathSciNet  MATH  Google Scholar 

  • Li H, Liu D, Wang D (2018) Manifold regularized reinforcement learning. IEEE Trans Neural Netw Learn Syst 29(4):932–943

    Article  Google Scholar 

  • Liang M, Wang D, Liu D (2020) Improved value iteration for neural-network-based stochastic optimal control design. Neural Netw 124:280–295

    Article  MATH  Google Scholar 

  • Liang M, Wang D, Liu D (2020) Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Trans Syst, Man Cybern: Syst 50(11):3972–3985

    Article  Google Scholar 

  • Lincoln B, Rantzer A (2006) Relaxing dynamic programming. IEEE Trans Autom Control 51:1249–1260

    Article  MathSciNet  MATH  Google Scholar 

  • Liu D, Li H, Wang D (2013) Data-based self-learning optimal control: Research progress and prospects. Acta Automatica Sinica 39(11):1858–1870

    Article  MathSciNet  MATH  Google Scholar 

  • Liu D, Li H, Wang D (2015) Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE Trans Neural Netw Learn Syst 26(6):1323–1334

    Article  MathSciNet  Google Scholar 

  • Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634

    Article  Google Scholar 

  • Liu D, Wei Q, Wang D, Yang X, Li H (2017) Adaptive dynamic programming with applications in optimal control. Springer, London

    Book  MATH  Google Scholar 

  • Liu D, Xu Y, Wei Q, Liu X (2018) Residential energy scheduling for variable weather solar energy based on adaptive dynamic programming. IEEE/CAA J Automatica Sinica 5(1):36–46

    Article  Google Scholar 

  • Liu D, Xue S, Zhao B, Luo B, Wei Q (2021) Adaptive dynamic programming for control: A survey and recent advances. IEEE Trans Syst, Man, Cybern: Syst 51(1):142–160

    Article  Google Scholar 

  • Luo B, Yang Y, Liu D (2021) Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Trans Cybern 51(7):3630–3640

    Article  Google Scholar 

  • Luo B, Yang Y, Liu D, Wu HN (2020) Event-triggered optimal control with performance guarantees using adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 31(1):76–88

    Article  MathSciNet  Google Scholar 

  • Luo B, Yang Y, Wu HN, Huang T (2020) Balancing value iteration and policy iteration for discrete-time control. IEEE Trans Syst, Man, Cybern: Syst 50(11):3948–3958

    Article  Google Scholar 

  • Modares H, Lewis FL (2014) Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Control 59(11):3051–3056

    Article  MathSciNet  MATH  Google Scholar 

  • Modares H, Lewis FL (2014) Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7):1780–1792

    Article  MathSciNet  MATH  Google Scholar 

  • Mu C, Wang D (2017) Neural-network-based adaptive guaranteed cost control of nonlinear dynamical systems with matched uncertainties. Neurocomputing 245:46–54

    Article  Google Scholar 

  • Mu C, Wang D, He H (2018) Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Trans Cybern 48(10):2948–2961

    Article  Google Scholar 

  • Murray JJ, Cox CJ, Lendaris GG, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst, Man, Cybern-Part C: Appl Rev 32(2):140–153

    Article  Google Scholar 

  • Na J, Lv Y, Zhang K, Zhao J (2021) Adaptive identifier-critic based optimal tracking control for nonlinear systems with experimental validation. IEEE Trans Syst, Man Cybern ((in press))

  • Pang B, Jiang ZP (2021) Adaptive optimal control of linear periodic systems: An off-policy value iteration approach. IEEE Trans Autom Control 66(2):888–894

    Article  MathSciNet  MATH  Google Scholar 

  • Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007

    Article  Google Scholar 

  • Si J, Barto AG, Powell WB, Wunsch DC (2004) Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, New Jersey

    Book  Google Scholar 

  • Si J, Wang YT (2001) On-line learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276

    Article  Google Scholar 

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489

    Article  Google Scholar 

  • Song R, Lewis FL, Wei Q, Zhang H (2016) Off-policy actor-critic structure for optimal control of unknown systems with disturbances. IEEE Trans Cybern 46(5):1041–1050

    Article  Google Scholar 

  • Song R, Wei Q, Zhang H, Lewis FL (2021) Discrete-time non-zero-sum games with completely unknown dynamics. IEEE Trans Cybern 51(6):2929–2943

    Article  Google Scholar 

  • Sutton RS, Barto AG (2018) Reinforcement learning: An introduction, 2nd edn. The MIT Press, Cambridge, Massachusetts

    MATH  Google Scholar 

  • Vamvoudakis KG (2017) Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. Syst Control Lett 100:14–20

    Article  MathSciNet  MATH  Google Scholar 

  • Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46(5):878–888

    Article  MathSciNet  MATH  Google Scholar 

  • Vrabie D, Vamvoudakis KG, Lewis FL (2013) Optimal adaptive control and differential games by reinforcement learning principles. IET, London

    MATH  Google Scholar 

  • Wang D, Ha M, Cheng L (2022) Neuro-optimal trajectory tracking with value iteration of discrete-time nonlinear dynamics. IEEE Trans N Netw Learn Syst (in press)

  • Wang D, Ha M, Qiao J (2020) Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans Autom Control 65(3):1272–1279

    Article  MathSciNet  MATH  Google Scholar 

  • Wang D, Ha M, Qiao J (2021) Data-driven iterative adaptive critic control towards an urban wastewater treatment plant. IEEE Trans Indus Electron 68(8):7362–7369

    Article  Google Scholar 

  • Wang D, Ha M, Qiao J, Yan J, Xie Y (2020) Data-based composite control design with critic intelligence for a wastewater treatment platform. Artif Intell Re 53(5):3773–3785

    Article  Google Scholar 

  • Wang D, He H, Liu D (2017) Adaptive critic nonlinear robust control: A survey. IEEE Trans Cybern 47(10):3429–3451

    Article  Google Scholar 

  • Wang D, Qiao J (2019) Approximate neural optimal control with reinforcement learning for a torsional pendulum device. Neural Netw 117:1–7

    Article  MATH  Google Scholar 

  • Wang D, Qiao J, Cheng L (2021) An approximate neuro-optimal solution of discounted guaranteed cost control design. IEEE Trans Cybern (in press)

  • Wang D, Liu D (2018) Learning and guaranteed cost control with event-based adaptive critic implementation. IEEE Trans Neural Netw Learn Syst 29(12):6004–6014

    Article  Google Scholar 

  • Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832

    Article  MathSciNet  MATH  Google Scholar 

  • Wang D, Liu D, Zhang Q, Zhao D (2016) Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans Syst, Man, Cybern: Syst 46(11):1544–1555

    Article  Google Scholar 

  • Wang D, Zhao M, Ha M, Ren J (2021) Neural optimal tracking control of constrained nonaffine systems with a wastewater treatment application. Neural Netw 143:121–132

    Article  Google Scholar 

  • Wang D, Zhao M, Qiao J (2021) Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control 31(14):6773–6787

    Article  Google Scholar 

  • Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with \(\varepsilon \)-error bound. IEEE Trans Neural Netw 22(1):24–36

    Article  Google Scholar 

  • Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47

    Article  Google Scholar 

  • Wei Q, Liu D, Yang X (2015) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879

    Article  MathSciNet  Google Scholar 

  • Wei Q, Song R, Liao Z, Li B, Lewis FL (2020) Discrete-time impulsive adaptive dynamic programming. IEEE Trans Cybern 50(10):4293–4306

    Article  Google Scholar 

  • Wei Q, Wang L, Lu J, Wang FY (2021) Discrete-time self-learning parallel control. IEEE Trans Syst, Man, Cybern: Syst (in press)

  • Werbos PJ (1974) Beyond regression: New tools for prediction and analysis in the behavioural sciences. Ph.D. dissertation, Harvard University

  • Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. General Syst Yearbook 22:25–38

    Google Scholar 

  • Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. Handbook of intelligent control: neural, fuzzy and adaptive approaches 493–526

  • Werbos PJ (2008) ADP: The key direction for future research in intelligent control and understanding brain intelligence. IEEE Trans Syst, Man, Cybern-Part B: Cybern 38(4):898–900

    Article  Google Scholar 

  • Xue S, Luo B, Liu D, Gao Y (2022) Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans Cybern (in press)

  • Xue S, Luo B, Liu D, Yang Y (2021) Constrained event-triggered \(H_{\infty }\) control based on adaptive dynamic programming with concurrent learning. IEEE Trans Syst, Man, Cybern: Syst (in press)

  • Yan J, He H, Zhong X, Tang Y (2017) Q-learning-based vulnerability analysis of smart grid against sequential topology attacks. IEEE Trans Inf Foren Secur 12(1):200–210

    Article  Google Scholar 

  • Yang X, Zeng Z, Gao Z (2022) Decentralized neuro-controller design with critic learning for nonlinear-interconnected systems. IEEE Trans Cybern (in press)

  • Yang X, He H (2021) Event-driven \(H_{\infty }\)-constrained control using adaptive critic learning. IEEE Trans Cybern 51(10):4860–4872

  • Yang X, He H, Zhong X (2021) Approximate dynamic programming for nonlinear-constrained optimizations. IEEE Trans Cybern 51(5):2419–2432

    Article  Google Scholar 

  • Yang Y, Gao W, Modares H, Xu CZ (2021) Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics. IEEE Trans Fuzzy Syst (in press)

  • Yang Y, Vamvoudakis K G, Modares H, Yin Y, Wunsch D C (2021). Hamiltonian-driven hybrid adaptive dynamic programming. IEEE Trans Syst, Man, Cybern: Syst 51(10):6423–6434

  • Zhang H, Liu D, Luo Y, Wang D (2013) Adaptive dynamic programming for control: algorithms and stability. Springer, London

    Book  MATH  Google Scholar 

  • Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503

    Article  Google Scholar 

  • Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for \(H_{\infty }\) state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718

    Article  Google Scholar 

  • Zhang H, Zhang X, Luo Y, Yang J (2013) An overview of research on adaptive dynamic programming. Acta Automatica Sinica 39(4):303–311

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang Q, Zhao D, Wang D (2018) Event-based robust control for uncertain nonlinear systems using adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 29(1):37–50

    Article  MathSciNet  Google Scholar 

  • Zhang Q, Zhao D, Zhu Y (2017) Event-triggered \(H_{\infty }\) control for continuous-time nonlinear system via concurrent learning. IEEE Trans Syst, Man, Cybern: Syst 47(7):1071–1081

    Article  Google Scholar 

  • Zhao B, Liu D (2020) Event-triggered decentralized tracking control of modular reconfigurable robots through adaptive dynamic programming. IEEE Trans Indus Electr 67(4):3054–3064

    Article  Google Scholar 

  • Zhao B, Wang D, Shi G, Liu D, Li Y (2018) Decentralized control for large-scale nonlinear systems with unknown mismatched interconnections via policy iteration. IEEE Trans Syst, Man, Cybern: Syst 48(10):1725–1735

    Article  Google Scholar 

  • Zhao D, Zhang Q, Wang D, Zhu Y (2016) Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern 46(3):854–865

    Article  Google Scholar 

  • Zhao Q, Xu H, Jagannathan S (2015) Neural network-based finite-horizon optimal control of uncertain affine nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(3):486–499

    Article  MathSciNet  Google Scholar 

  • Zhong X, He H, Wang D, Ni Z (2018) Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Trans Cybern 48(5):1633–1646

    Article  Google Scholar 

  • Zhong X, Ni Z, He H (2016) A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans Neural Netw Learn Syst 27(12):2513–2525

    Article  Google Scholar 

  • Zhu Y, Zhao D (2018) Comprehensive comparison of online ADP algorithms for continuous-time optimal control. Artif Intell Rev 49(4):531–547

    Article  Google Scholar 

  • Zhu Y, Zhao D (2021) Online minimax Q network learning for two-player zero-sum Markov games. IEEE Trans Neural Netw Learn Syst (in press)

  • Zhu Y, Zhao D, Li X, Wang D (2019) Control-limited adaptive dynamic programming for multi-battery energy storage systems. IEEE Trans Smart Grid 10(4):4235–4244

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ding Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by Beijing Natural Science Foundation under Grant JQ19013, in part by the National Natural Science Foundation of China under Grant 61773373, Grant 61890930-5, and Grant 62021003, and in part by the National Key Research and Development Project under Grant 2021ZD0112300-2 and Grant 2018YFC1900800-5. No conflict of interest exits in this manuscript and it has been approved by all authors for publication.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, D., Ha, M. & Zhao, M. The intelligent critic framework for advanced optimal control. Artif Intell Rev 55, 1–22 (2022). https://doi.org/10.1007/s10462-021-10118-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-021-10118-9

Keywords

Navigation