Performance-constrained fault-tolerant DSC based on reinforcement learning for nonlinear systems with uncertain parameters

https://doi.org/10.1016/j.amc.2022.127759Get rights and content

Highlights

  • The proposed algorithm in this paper, a time-varying scaling function is used to make the new prescribed function infinite at the initial time, the initial error constraint is weakened and making the tracking error significantly reduced. By introducing RL in the backstepping method to optimize the error and input, not only the tracking error is further reduced, but also the overshoot and jitter of the input are suppressed, the performance is improved. The proposed algorithm was verified to be better than the traditional method through simulation.

  • RL can significantly optimize the performance of the system, however, introducing RL algorithms in FTC is difficult. By constructing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of RL fault-tolerant controller design are reduced. Since the bounds of the fault parameters are estimated rather than the parameters themselves, the actual controller constructed can compensate for an infinite number of actuator faults.

  • Unlike traditional RL-based PPC methods, the prescribed function used is initially unbounded and does not require the initial value of the tracking error to be within a specified range. In addition, the considered faults are more complex and the proposed algorithm can handle an infinite number of actuator faults. Finally, the backstepping method based on RL is used, the obtained weight adaptive laws are simpler than traditional RL and the persistence of excitation conditions are relaxed.

Abstract

In this paper, a performance-constrained fault-tolerant dynamic surface control (DSC) algorithm based on reinforcement learning (RL) is proposed for nonlinear systems with unknown parameters and actuator failures. Considering the problem of multiple actuator failures, the bound for sum of the failure parameters are estimated rather than the parameters themselves, an infinite number of actuator failures can be handled. To improve the performance of the system, based on actor-critic neural networks (NNs) and optimized backstepping control (OBC), RL is introduced to optimize the tracking errors and inputs. By introducing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of using RL in fault-tolerant control (FTC) are reduced. In addition, an initial unbounded boundary function is used so that the initial value of the error does not need to be within a prescribed range, not only the tracking error can be reduced to the prescribed accuracy, but also all closed-loop signals are bounded. Finally, the effectiveness and advantages of the proposed algorithm are verified by two examples.

Introduction

In recent years, optimal control based on Bellman’s principle, which seeks to consume the minimum performance cost, has been widely studied. The optimal control strategy can be obtained by solving the Hamilton–Jacobi–Bellman (HJB) equation [1]. However, the HJB equation is currently difficult to obtain its analytical solution, thus this is a significant limitation. RL based on the HJB equation has the feature of online training to obtain near-optimal solutions and has been focused on [2]. Recently, some classic methods have also been reported [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Model-free optimal controllers for discrete-time nonlinear systems are generally used in Luo et al. [3,4]. And some data-driven RL methods for continuous-time systems with unknown dynamics are provided in studies such as Wei et al. [6], Mazouchi et al. [8], Zhang et al. [13]. Wen et al. [9,10], Li et al. [11], Wen et al. [12] studied an RL-based OBC for solving the optimal control problem of strict-feedback systems. These methods do not take into account the case of actuator failures.

Adaptive tracking control has been developed over decades and has become very mature while being widely used in practical situations [14], [15]. However, often in practice, actuators and sensors are prone to failure while the system is operating, which can often lead to instability or even collapse of the closed-loop system [16]. This makes it important to develop new fault-tolerant control methods to keep the system stable [17], [18], [19], [20], [21]. These approaches, although innovative and very effective, do not take into account performance optimization. Some recent techniques for fault-tolerant control have been combined with other approaches such as performance cost reduction and online learning [22], [23], [23], [24], [25]. These methods not only allow closed-loop systems with faults to remain stable, but also allow the controller to be trained and optimized online, with significant performance improvements in tracking errors and inputs. At present, the introduction of optimization in FTC is still worth investigating. Introducing RL in FTC methods is difficult. There are multiple complex scenarios when loss of control effectiveness (LOCE) and bias faults (BF) occur, it is difficult to achieve fault-tolerant control by RL alone unless all possible scenarios have been learned.

In addition, the prescribed performance control (PPC) can be used to constrain the tracking error to a specific range. The PPC method proposed in Bechlioulis and Rovithakis [26] uses a prescribed exponential function and then transforms it based on the tracking error or stable variable, and if the transformed variable is bounded, then the tracking error is also bounded. Based on the PPC framework proposed by Bechlioulis and Rovithakis [26], many advanced PPC results have been reported [27], [28]. Wang et al. [29] created a data-driven performance-prescribed RL algorithm to simultaneously pursue optimality of the control method and prescribed tracking error. Recently, the introduction of RL algorithms in PPC techniques has started to receive attention. This allows tracking errors and control inputs to be significantly reduced and performance to be improved. Some novel results were reported such as [29], [30], [31]. The problem of prescribed performance FTC of a class of nonlinear multi-input multi-output (MIMO) systems is studied using RL algorithms [32]. In [33], the FTC problem is solved by combining PPC and incremental ADP. However, in the above works, the initial values of the variables are required to be within a small range, which actually depends on the chosen prescribed boundary function.

Over the past few years, many scholars have been trying to remove this limitation. Berger et al. [34] eliminates the initial value restriction via using a funnel control method. This method utilizes a time-varying function that allows large values if the boundary is approached. The delayed PPC strategy proposed in Song and Zhou [35], Li et al. [36] uses a function that is equal to zero at the initial moment. By multiplying such a function, the initial value becomes zero, and then the initial constraint is not needed. However, these two methods are conservative or complex. For funnel control, a relative degree greater than 2 must be required to achieve bounded stability, and for delayed PPC, the differentiability of the shift function must be proved [37], [38]. A new scheme is proposed in Zhao et al. [38] by constructing a time-varying scaling function such that the prescribed function is unbounded at the initial moment and thus does not need to restrict the initial values of the variables.

Although the work in Zhao et al. [38] is novel, it does not consider other complications such as the “curse of dimensionality”, actuator failures, and performance optimization. Inspired by the above studies, this paper designs an optimal fault-tolerant control algorithm that not only achieves performance constraints based on fault tolerance, but also optimizes the performance of inputs and errors. The proposed algorithm also removes the restriction that the initial tracking error must be within a small range. Current reports on solving this problem are scarce, as [32].

In this paper, a performance-constrained fault-tolerant DSC algorithm based on RL is proposed for nonlinear systems with unknown parameters and actuator failures. The main contributions and differences are

  • 1.

    In this paper, a time-varying scaling function is used to make the new prescribed function infinite at the initial time, the initial error constraint is weakened and making the tracking error significantly reduced. By introducing RL in the backstepping method to optimize the error and input, not only the tracking error is further reduced, but also the overshoot and jitter of the input are suppressed, the performance is improved. The proposed algorithm is verified to be better than the traditional methods through simulation.

  • 2.

    RL can significantly optimize the performance of the system, however, introducing RL algorithms in FTC is difficult. By constructing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of RL fault-tolerant controller design are reduced. Since the bounds of the fault parameters are estimated rather than the parameters themselves, the constructed actual controller can compensate for an infinite number of actuator faults.

  • 3.

    Unlike traditional RL-based PPC methods, the prescribed function used is initially unbounded and does not require the initial value of the tracking error to be within a specified range. In addition, the considered faults are more complex and the proposed algorithm can handle an infinite number of actuator faults. Finally, the backstepping method based on RL is used, the obtained weight adaptive laws are simpler than traditional ones and the persistence of excitation conditions are relaxed.

Notations: The R denotes the Euclidean space. The denotes the mode of the vector. “inf” is the infimum function and “sup” is the supremum function. “ λmax ” and “λmin” are the maximum and minimum eigenvalues, respectively.

Section snippets

System description

Consider the following nonlinear systems with uncertain parameters and actuator failures as{x˙i(t)=xi+1(t)+ϑiTφi(x̲i(t)),i=1,n1x˙n(t)=gTU(t)+ϑnTφn(x̲n(t))y(t)=x1(t)where x̲i=[x1,x2,,xi]TRi,i=1,2,,n are the state vectors, g=[g1,,gm]TRm is the input coefficient matrix, ϑiRl is the uncertain parameter vector of the system, φiRl is known nonlinear function and U(t)Rm are the output of the actuators.

Consider the following actuator faults, including loss of control effectiveness (LOCE) and

RL-based fault-tolerant controller design

In this section, we will use the backstepping technique to design the controller and introduce RL in the process to optimize the tracking errors and inputs.

First, the derivative of χ(t) defined in (13) asχ˙=1+λ2(1λ2)2λ˙=δ1e˙1+δ2δ1=ρ1+λ2(1λ2)22π(1+e12),δ2=2πρ˙arctan(e1)1+λ2(1λ2)2.It is not difficult to find that δ1 in (18) is a bounded and positive definite time-varying function. The transformed tracking errors are defined asz1=χ,zi=ei,i=2,,n.

step 1: According to (2), (4), (17) and (19), we

Stability analysis

Theorem 1

Consider the nonlinear system (2) with unknown actuator faults, the virtual control laws (26) and (44), the intermediate controller (57), the actual control law (61), and the update laws (32), (46), (59) and (60). When complex failures occur in multiple actuators, all tracking errors and estimation error signals for the closed-loop systems are SGUUB and the output tracking error is always within the prescribed range.

Proof

From (61), one hasznk=1mgkak,huk=k=1mgkak,hzn2θ^2u¯o2zn2θ^2u¯o2+v¯32rzn2θ

Simulation

Example 1

Consider the following single-link manipulator system with motor dynamics as{Dp¨+Lp˙+Nsin(p)=νMν˙+Pν=i=12uiKp˙y(t)=p

where p, p˙ and p¨ represent position, velocity and acceleration, respectively. ν is the motor shaft angle and ui is the input of motor. The other parameters are D=1, L=1, M=0.05, P=0.05, N=10 and K=10.

Let x1=p, x2=p˙ and x3=p¨. The system can be written in numerical form as{x˙1=x2x˙2=x3ϑ2T[sin(x1);x2]x˙3=i=12giui+ϑ3T[x2;x3]y(t)=x1where g1=g2=1/(MD), ϑ2=[N/D,L/D]T and ϑ3=[K/(

Conclusion

In this paper an RL-based prescribed performance fault-tolerant DSC algorithm is proposed. Based on OBC, RL is used to optimize tracking errors and inputs. Considering the problem of multiple actuator failures, by introducing an intermediate controller, the controllers derived from RL algorithm and the fault-tolerant controller are isolated, the difficulties of using RL in fault-tolerant control (FTC) are reduced. Since the bound for sum of the failure parameters are estimated rather than the

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 62273079 and Grant 61420106016, the Fundamental Research Funds for the Central Universities in China under Grant N2004002, Grant N2104005 and Grant N182608004 and the Research Fund of State Key Laboratory of Synthetical Automation for Process Industries in China under Grant 2013ZCX01.

References (43)

  • J. Lan et al.

    Time-varying optimal formation control for second-order multiagent systems based on neural network observer and reinforcement learning

    IEEE Trans. Neural Netw. Learn. Syst.

    (2022)
  • B. Luo et al.

    Policy gradient adaptive dynamic programming for data-based optimal control

    IEEE Trans. Cybern.

    (2017)
  • B. Luo et al.

    Model-free optimal tracking control via critic-only Q-learning

    IEEE Trans. Neural Netw. Learn. Syst.

    (2016)
  • Q. Wei et al.

    Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP

    IEEE Trans. Neural Netw. Learn. Syst.

    (2016)
  • D. Li et al.

    Robust control for a class of nonlinear systems with input constraints based on actor-critic learning

    Int. J. Robust Nonlinear Control

    (2022)
  • M. Mazouchi et al.

    Data-driven dynamic multiobjective optimal control: an aspiration-satisfying reinforcement learning approach

    IEEE Trans. Neural Netw. Learn. Syst.

    (2021)
  • G. Wen et al.

    Optimized backstepping control using reinforcement learning of observer-critic-actor architecture based on fuzzy system for a class of nonlinear strict-feedback systems

    IEEE Trans. Fuzzy Syst.

    (2022)
  • G. Wen et al.

    Optimized backstepping tracking control using reinforcement learning for a class of stochastic nonlinear strict-feedback systems

    IEEE Trans. Neural Netw. Learn. Syst.

    (2021)
  • G. Wen et al.

    Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions

    IEEE Trans. Cybern.

    (2021)
  • H. Zhang et al.

    Robust optimal control scheme for unknown constrained-input nonlinear systems via a plug-n-play event-sampled critic-only algorithm

    IEEE Trans. Syst., Man, Cybern.

    (2020)
  • N. Wang et al.

    Autonomous pilot of unmanned surface vehicles: bridging path planning and tracking

    IEEE Trans. Veh. Technol.

    (2021)
  • Cited by (8)

    View all citing articles on Scopus
    View full text