Performance-constrained fault-tolerant DSC based on reinforcement learning for nonlinear systems with uncertain parameters

doi:10.1016/j.amc.2022.127759

Applied Mathematics and Computation

Volume 443, 15 April 2023, 127759

https://doi.org/10.1016/j.amc.2022.127759 Get rights and content

Highlights

•
The proposed algorithm in this paper, a time-varying scaling function is used to make the new prescribed function infinite at the initial time, the initial error constraint is weakened and making the tracking error significantly reduced. By introducing RL in the backstepping method to optimize the error and input, not only the tracking error is further reduced, but also the overshoot and jitter of the input are suppressed, the performance is improved. The proposed algorithm was verified to be better than the traditional method through simulation.
•
RL can significantly optimize the performance of the system, however, introducing RL algorithms in FTC is difficult. By constructing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of RL fault-tolerant controller design are reduced. Since the bounds of the fault parameters are estimated rather than the parameters themselves, the actual controller constructed can compensate for an infinite number of actuator faults.
•
Unlike traditional RL-based PPC methods, the prescribed function used is initially unbounded and does not require the initial value of the tracking error to be within a specified range. In addition, the considered faults are more complex and the proposed algorithm can handle an infinite number of actuator faults. Finally, the backstepping method based on RL is used, the obtained weight adaptive laws are simpler than traditional RL and the persistence of excitation conditions are relaxed.

Abstract

In this paper, a performance-constrained fault-tolerant dynamic surface control (DSC) algorithm based on reinforcement learning (RL) is proposed for nonlinear systems with unknown parameters and actuator failures. Considering the problem of multiple actuator failures, the bound for sum of the failure parameters are estimated rather than the parameters themselves, an infinite number of actuator failures can be handled. To improve the performance of the system, based on actor-critic neural networks (NNs) and optimized backstepping control (OBC), RL is introduced to optimize the tracking errors and inputs. By introducing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of using RL in fault-tolerant control (FTC) are reduced. In addition, an initial unbounded boundary function is used so that the initial value of the error does not need to be within a prescribed range, not only the tracking error can be reduced to the prescribed accuracy, but also all closed-loop signals are bounded. Finally, the effectiveness and advantages of the proposed algorithm are verified by two examples.

Introduction

In recent years, optimal control based on Bellman’s principle, which seeks to consume the minimum performance cost, has been widely studied. The optimal control strategy can be obtained by solving the Hamilton–Jacobi–Bellman (HJB) equation [1]. However, the HJB equation is currently difficult to obtain its analytical solution, thus this is a significant limitation. RL based on the HJB equation has the feature of online training to obtain near-optimal solutions and has been focused on [2]. Recently, some classic methods have also been reported [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Model-free optimal controllers for discrete-time nonlinear systems are generally used in Luo et al. [3,4]. And some data-driven RL methods for continuous-time systems with unknown dynamics are provided in studies such as Wei et al. [6], Mazouchi et al. [8], Zhang et al. [13]. Wen et al. [9,10], Li et al. [11], Wen et al. [12] studied an RL-based OBC for solving the optimal control problem of strict-feedback systems. These methods do not take into account the case of actuator failures.

Adaptive tracking control has been developed over decades and has become very mature while being widely used in practical situations [14], [15]. However, often in practice, actuators and sensors are prone to failure while the system is operating, which can often lead to instability or even collapse of the closed-loop system [16]. This makes it important to develop new fault-tolerant control methods to keep the system stable [17], [18], [19], [20], [21]. These approaches, although innovative and very effective, do not take into account performance optimization. Some recent techniques for fault-tolerant control have been combined with other approaches such as performance cost reduction and online learning [22], [23], [23], [24], [25]. These methods not only allow closed-loop systems with faults to remain stable, but also allow the controller to be trained and optimized online, with significant performance improvements in tracking errors and inputs. At present, the introduction of optimization in FTC is still worth investigating. Introducing RL in FTC methods is difficult. There are multiple complex scenarios when loss of control effectiveness (LOCE) and bias faults (BF) occur, it is difficult to achieve fault-tolerant control by RL alone unless all possible scenarios have been learned.

In addition, the prescribed performance control (PPC) can be used to constrain the tracking error to a specific range. The PPC method proposed in Bechlioulis and Rovithakis [26] uses a prescribed exponential function and then transforms it based on the tracking error or stable variable, and if the transformed variable is bounded, then the tracking error is also bounded. Based on the PPC framework proposed by Bechlioulis and Rovithakis [26], many advanced PPC results have been reported [27], [28]. Wang et al. [29] created a data-driven performance-prescribed RL algorithm to simultaneously pursue optimality of the control method and prescribed tracking error. Recently, the introduction of RL algorithms in PPC techniques has started to receive attention. This allows tracking errors and control inputs to be significantly reduced and performance to be improved. Some novel results were reported such as [29], [30], [31]. The problem of prescribed performance FTC of a class of nonlinear multi-input multi-output (MIMO) systems is studied using RL algorithms [32]. In [33], the FTC problem is solved by combining PPC and incremental ADP. However, in the above works, the initial values of the variables are required to be within a small range, which actually depends on the chosen prescribed boundary function.

Over the past few years, many scholars have been trying to remove this limitation. Berger et al. [34] eliminates the initial value restriction via using a funnel control method. This method utilizes a time-varying function that allows large values if the boundary is approached. The delayed PPC strategy proposed in Song and Zhou [35], Li et al. [36] uses a function that is equal to zero at the initial moment. By multiplying such a function, the initial value becomes zero, and then the initial constraint is not needed. However, these two methods are conservative or complex. For funnel control, a relative degree greater than 2 must be required to achieve bounded stability, and for delayed PPC, the differentiability of the shift function must be proved [37], [38]. A new scheme is proposed in Zhao et al. [38] by constructing a time-varying scaling function such that the prescribed function is unbounded at the initial moment and thus does not need to restrict the initial values of the variables.

Although the work in Zhao et al. [38] is novel, it does not consider other complications such as the “curse of dimensionality”, actuator failures, and performance optimization. Inspired by the above studies, this paper designs an optimal fault-tolerant control algorithm that not only achieves performance constraints based on fault tolerance, but also optimizes the performance of inputs and errors. The proposed algorithm also removes the restriction that the initial tracking error must be within a small range. Current reports on solving this problem are scarce, as [32].

In this paper, a performance-constrained fault-tolerant DSC algorithm based on RL is proposed for nonlinear systems with unknown parameters and actuator failures. The main contributions and differences are

1.
In this paper, a time-varying scaling function is used to make the new prescribed function infinite at the initial time, the initial error constraint is weakened and making the tracking error significantly reduced. By introducing RL in the backstepping method to optimize the error and input, not only the tracking error is further reduced, but also the overshoot and jitter of the input are suppressed, the performance is improved. The proposed algorithm is verified to be better than the traditional methods through simulation.
2.
RL can significantly optimize the performance of the system, however, introducing RL algorithms in FTC is difficult. By constructing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of RL fault-tolerant controller design are reduced. Since the bounds of the fault parameters are estimated rather than the parameters themselves, the constructed actual controller can compensate for an infinite number of actuator faults.
3.
Unlike traditional RL-based PPC methods, the prescribed function used is initially unbounded and does not require the initial value of the tracking error to be within a specified range. In addition, the considered faults are more complex and the proposed algorithm can handle an infinite number of actuator faults. Finally, the backstepping method based on RL is used, the obtained weight adaptive laws are simpler than traditional ones and the persistence of excitation conditions are relaxed.

Notations: The $R$ denotes the Euclidean space. The $∣ • ∣$ denotes the mode of the vector. “ $\inf$ ” is the infimum function and “ $\sup$ ” is the supremum function. “ $λ_{\max}$ ” and “ $λ_{\min}$ ” are the maximum and minimum eigenvalues, respectively.

Section snippets

System description

Consider the following nonlinear systems with uncertain parameters and actuator failures as ${\begin{matrix} {\dot{x}}_{i} (t) = x_{i + 1} (t) + ϑ_{i}^{T} φ_{i} ({\underset{̲}{x}}_{i} (t)), i = 1, \dots n - 1 \\ {\dot{x}}_{n} (t) = g^{T} U (t) + ϑ_{n}^{T} φ_{n} ({\underset{̲}{x}}_{n} (t)) \\ y (t) = x_{1} (t) \end{matrix}$ where ${\underset{̲}{x}}_{i} = {[x_{1}, x_{2}, \dots, x_{i}]}^{T} \in R^{i}, i = 1, 2, \dots, n$ are the state vectors, $g = {[g_{1}, \dots, g_{m}]}^{T} \in R^{m}$ is the input coefficient matrix, $ϑ_{i} \in R^{l}$ is the uncertain parameter vector of the system, $φ_{i} \in R^{l}$ is known nonlinear function and $U (t) \in R^{m}$ are the output of the actuators.

Consider the following actuator faults, including loss of control effectiveness (LOCE) and

RL-based fault-tolerant controller design

In this section, we will use the backstepping technique to design the controller and introduce RL in the process to optimize the tracking errors and inputs.

First, the derivative of $χ (t)$ defined in (13) as $\dot{χ} = \frac{1 + λ^{2}}{{(1 - λ^{2})}^{2}} \dot{λ} = δ_{1} {\dot{e}}_{1} + δ_{2}$ $δ_{1} = ρ \frac{1 + λ^{2}}{{(1 - λ^{2})}^{2}} \frac{2}{π (1 + e_{1}^{2})}, δ_{2} = \frac{2}{π} \dot{ρ} \arctan (e_{1}) \frac{1 + λ^{2}}{{(1 - λ^{2})}^{2}} .$ It is not difficult to find that $δ_{1}$ in (18) is a bounded and positive definite time-varying function. The transformed tracking errors are defined as $z_{1} = χ, z_{i} = e_{i}, i = 2, \dots, n .$

step 1: According to (2), (4), (17) and (19), we

Stability analysis

Theorem 1

Consider the nonlinear system (2) with unknown actuator faults, the virtual control laws (26) and (44), the intermediate controller (57), the actual control law (61), and the update laws (32), (46), (59) and (60). When complex failures occur in multiple actuators, all tracking errors and estimation error signals for the closed-loop systems are SGUUB and the output tracking error is always within the prescribed range.

Proof

From (61), one has $z_{n} \sum_{k = 1}^{m} g_{k} a_{k, h} u_{k} = - \sum_{k = 1}^{m} ∣ g_{k} ∣ a_{k, h} \frac{z_{n}^{2} {\hat{θ}}^{2} {\bar{u}}_{o}^{2}}{\sqrt{z_{n}^{2} {\hat{θ}}^{2} {\bar{u}}_{o}^{2} + {\bar{v}}_{3}^{2}}} \leq - r z_{n}^{2} θ$

Simulation

Example 1

Consider the following single-link manipulator system with motor dynamics as ${\begin{matrix} D \ddot{p} + L \dot{p} + N \sin (p) = ν \\ M \dot{ν} + P ν = \sum_{i = 1}^{2} u_{i} - K \dot{p} \\ y (t) = p \end{matrix}$

where

p

\dot{p}

and

\ddot{p}

represent position, velocity and acceleration, respectively.

ν

is the motor shaft angle and

u_{i}

is the input of motor. The other parameters are

D = 1

L = 1

M = 0.05

P = 0.05

N = 10

and

K = 10

Let $x_{1} = p$ , $x_{2} = \dot{p}$ and $x_{3} = \ddot{p}$ . The system can be written in numerical form as ${\begin{matrix} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = x_{3} - ϑ_{2}^{T} [\sin (x_{1}); x_{2}] \\ {\dot{x}}_{3} = \sum_{i = 1}^{2} g_{i} u_{i} + ϑ_{3}^{T} [x_{2}; x_{3}] \\ y (t) = x_{1} \end{matrix}$ where $g_{1} = g_{2} = 1 / (MD)$ , $ϑ_{2} = {[N / D, L / D]}^{T}$ and $ϑ_{3} = [K / ($

Conclusion

In this paper an RL-based prescribed performance fault-tolerant DSC algorithm is proposed. Based on OBC, RL is used to optimize tracking errors and inputs. Considering the problem of multiple actuator failures, by introducing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of using RL in fault-tolerant control (FTC) are reduced. Since the bound for sum of the failure parameters are estimated rather than the

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 62273079 and Grant 61420106016, the Fundamental Research Funds for the Central Universities in China under Grant N2004002, Grant N2104005 and Grant N182608004 and the Research Fund of State Key Laboratory of Synthetical Automation for Process Industries in China under Grant 2013ZCX01.

References (43)

Y. Yang et al.
Robust actor-critic learning for continuous-time nonlinear systems with unmodeled dynamics
IEEE Trans. Fuzzy Syst.
(2021)
Y. Li et al.
Adaptive optimized backstepping control-based RL algorithm for stochastic nonlinear systems with state constraints and its application
IEEE Trans. Cybern.
(2021)
H. Liu et al.
Data-driven optimal tracking control for SMA actuated systems with prescribed performance via reinforcement learning
Mech. Syst. Signal Process.
(2022)
H. Chen et al.
Reinforcement learning-based close formation control for underactuated surface vehicle with prescribed performance and time-varying state constraints
Ocean Eng.
(2022)
X. Wang et al.
Prescribed performance fault-tolerant control for uncertain nonlinear MIMO system using actor-critic learning structure
IEEE Trans. Neural Netw. Learn. Syst.
(2021)
T. Berger et al.
Funnel control for nonlinear systems with known strict relative degree
Automatica
(2018)
Y.-D. Song et al.
Tracking control of uncertain nonlinear systems with deferred asymmetric time-varying full state constraints
Automatica
(2018)
Y. Li et al.
Global output feedback tracking control for switched nonlinear systems with deferred prescribed performance
J. Frankl. Inst.
(2021)
Y.-X. Li
Finite time command filtered adaptive fault tolerant control for a class of uncertain nonlinear systems
Automatica
(2019)
D. Liu et al.
Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems
IEEE Trans. Cybern.
(2014)

J. Lan et al.

Time-varying optimal formation control for second-order multiagent systems based on neural network observer and reinforcement learning

IEEE Trans. Neural Netw. Learn. Syst.

(2022)

B. Luo et al.

Policy gradient adaptive dynamic programming for data-based optimal control

IEEE Trans. Cybern.

(2017)

B. Luo et al.

Model-free optimal tracking control via critic-only Q-learning

IEEE Trans. Neural Netw. Learn. Syst.

(2016)

Q. Wei et al.

Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP

IEEE Trans. Neural Netw. Learn. Syst.

(2016)

D. Li et al.

Robust control for a class of nonlinear systems with input constraints based on actor-critic learning

Int. J. Robust Nonlinear Control

(2022)

M. Mazouchi et al.

Data-driven dynamic multiobjective optimal control: an aspiration-satisfying reinforcement learning approach

IEEE Trans. Neural Netw. Learn. Syst.

(2021)

G. Wen et al.

Optimized backstepping control using reinforcement learning of observer-critic-actor architecture based on fuzzy system for a class of nonlinear strict-feedback systems

IEEE Trans. Fuzzy Syst.

(2022)

G. Wen et al.

Optimized backstepping tracking control using reinforcement learning for a class of stochastic nonlinear strict-feedback systems

IEEE Trans. Neural Netw. Learn. Syst.

(2021)

G. Wen et al.

Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions

IEEE Trans. Cybern.

(2021)

H. Zhang et al.

Robust optimal control scheme for unknown constrained-input nonlinear systems via a plug-n-play event-sampled critic-only algorithm

IEEE Trans. Syst., Man, Cybern.

(2020)

N. Wang et al.

Autonomous pilot of unmanned surface vehicles: bridging path planning and tracking

IEEE Trans. Veh. Technol.

(2021)

Cited by (8)

High-order disturbance observer-based safe tracking control for a class of uncertain MIMO nonlinear systems with time-varying full state constraints
2024, Applied Mathematics and Computation
This paper investigates a high-order disturbance observer-based safe tracking control scheme for a class of uncertain multiple-input and multiple-output systems under time-varying full state constraints and disturbances. To achieve the safe tracking objective, a boundary protection algorithm is introduced to generate new safe desired signals which are within corresponding state constraints. An improved second-order dynamic surface control technology is developed to deal with the piecewise differentiability of safe desired signals and the phenomenon of repeatedly differentiation, simultaneously. To handle the negative effects of system uncertainties and obtain better estimation effect of high order time-varying disturbances, the radial basis function neural networks and high-order disturbance observer methods are developed. The safety and performance of the closed-loop nonlinear system under the proposed control scheme have been rigorous proved and discussed by Lyapunov stability analysis. Finally, a two-link manipulator model has been given as an example, and the numerical simulations are given to express the availability of the proposed controller.
Output-feedback optimized consensus for directed graph multi-agent systems based on reinforcement learning and subsystem error derivatives
2023, Information Sciences
A distributed output-feedback optimal tracking control (OTC) method based on reinforcement learning (RL) is proposed for state-unmeasured multi-agent systems (MASs) under a directed graph. Firstly, the state observers are designed to estimate the states of MASs using the output signals, and the gains of the observers are not required to satisfy the Hurwitz matrix inequality. Then, a class of value functions based on error derivatives is proposed to solve the unbounded problem of traditional value functions for strict-feedback MASs. By using the value functions, the Hamilton-Jacobi Bellman (HJB) equations and the optimal control inputs are derived. The traditional RL-based backstepping control (OBC) methods are difficult to deal with the optimized consensus problem under a directed digraph, it is solved by using the observer-actor-critic structures and the new value functions in this paper, and online learning is implemented. Furthermore, there is no “dimensional pressure” to approximate the optimal control inputs and value functions using actor-critic neural networks (NNs), and the observed error and the consensus error are shown to be bounded. Finally, the effectiveness and advantages of the algorithm are verified by simulation.
Approximate Optimal Robust Tracking Control Based on State Error and Derivative Without Initial Admissible Input
2024, IEEE Transactions on Systems, Man, and Cybernetics: Systems
State Constrained Fault-Tolerant Control of Hypersonic Vehicle with Unknown Centroid Shift Based on Zero-Sum Game
2024, IEEE Transactions on Aerospace and Electronic Systems
Fuzzy Weight-Based Reinforcement Learning for Event-Triggered Optimal Backstepping Control of Fractional-Order Nonlinear Systems
2024, IEEE Transactions on Fuzzy Systems
Critic-Only Learning Based Tracking Control for Uncertain Nonlinear Systems with Prescribed Performance
2023, Electronics (Switzerland)

View all citing articles on Scopus

View full text

Performance-constrained fault-tolerant DSC based on reinforcement learning for nonlinear systems with uncertain parameters

Highlights

Abstract

Introduction

Section snippets

System description

RL-based fault-tolerant controller design

Stability analysis

Simulation

Conclusion

Acknowledgments

IEEE Trans. Fuzzy Syst.

IEEE Trans. Cybern.

Mech. Syst. Signal Process.

Ocean Eng.

IEEE Trans. Neural Netw. Learn. Syst.

Automatica

Automatica

J. Frankl. Inst.

Automatica

Neural-network-based online HJB solution for optimal robust guaranteed cost control of continuous-time uncertain nonlinear systems

IEEE Trans. Cybern.

Time-varying optimal formation control for second-order multiagent systems based on neural network observer and reinforcement learning

IEEE Trans. Neural Netw. Learn. Syst.

Policy gradient adaptive dynamic programming for data-based optimal control

IEEE Trans. Cybern.

Model-free optimal tracking control via critic-only Q-learning

IEEE Trans. Neural Netw. Learn. Syst.

Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP

IEEE Trans. Neural Netw. Learn. Syst.

Robust control for a class of nonlinear systems with input constraints based on actor-critic learning

Int. J. Robust Nonlinear Control

Data-driven dynamic multiobjective optimal control: an aspiration-satisfying reinforcement learning approach

IEEE Trans. Neural Netw. Learn. Syst.

Optimized backstepping control using reinforcement learning of observer-critic-actor architecture based on fuzzy system for a class of nonlinear strict-feedback systems

IEEE Trans. Fuzzy Syst.

Optimized backstepping tracking control using reinforcement learning for a class of stochastic nonlinear strict-feedback systems

IEEE Trans. Neural Netw. Learn. Syst.

Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions

IEEE Trans. Cybern.

Robust optimal control scheme for unknown constrained-input nonlinear systems via a plug-n-play event-sampled critic-only algorithm

IEEE Trans. Syst., Man, Cybern.

Autonomous pilot of unmanned surface vehicles: bridging path planning and tracking

IEEE Trans. Veh. Technol.