Abstract

When applied to solving the data modeling and optimal control problems of complex systems, the dual heuristic dynamic programming (DHP) technique, which is based on the BP neural network algorithm (BP-DHP), has difficulty in prediction accuracy, slow convergence speed, poor stability, and so forth. In this paper, a dual DHP technique based on Extreme Learning Machine (ELM) algorithm (ELM-DHP) was proposed. Through constructing three kinds of network structures, the paper gives the detailed realization process of the DHP technique in the ELM. The controller designed upon the ELM-DHP algorithm controlled a molecular distillation system with complex features, such as multivariability, strong coupling, and nonlinearity. Finally, the effectiveness of the algorithm is verified by the simulation that compares DHP and HDP algorithms based on ELM and BP neural network. The algorithm can also be applied to solve the data modeling and optimal control problems of similar complex systems.

1. Introduction

With the increase of the dynamic complexity of controlled objects and their widespread application, some dynamic models of controlled objects can be obtained by mathematical methods. But some of the controlled objects are too complex to establish an accurate mathematical model, such as a robot system and a chemical generation process. Even though the mathematical model of complex systems has been established, it will be usually a high-order nonlinear time-varying complex differential equation. So, it could not describe the system accurately, and it is too difficult to analyze and process the data, and hence the unknown model must be learned by the observational data [1]. This is because of the great fault tolerance, self-adaptation, self-organization, and learning and memory ability of neural networks, which provides a new method for the modeling of complex systems [25]. However, BP networks, RBF networks, and SVM have some defects such as slow convergence, which causes a big gap between the approximation ability and the actual demand of complex systems [6, 7]. In order to solve this problem, Huang GB proposed single hidden layer feedforward network (SLFN) training—Extreme learning Machine (ELM)—in 2006 [8]. The ELM gives the weights and thresholds of the weights randomly and then calculates the output weights by the regularization principle, which can still approach any continuous system [6, 7]. It has been proved that the SLFN hidden layer node parameter randomly accessed does not affect the convergence ability, and it also makes the speed of learning the ELM thousands of times faster than the traditional BP network and SVM.

In recent years, dynamic programming has been used to solve the optimal control problem, the “curse of dimensionality” problem, so no optimal solution could be obtained [9]. In 1977, Werbos proposed a heuristic dynamic programming (HDP) and dual heuristic programming (DHP) concept and proposed a method of approximate dynamic programming (adaptive/approximate dynamic programming, ADP) to solve the “curse of dimensionality” problem [10]. Werbos defines “intelligence” as the brain’s ability to learn a utility function maximally in a complex, unknown, nonlinear environment [11]. ADP is the general scheme for learning approximate optimal action strategies. Therefore, ADP can be regarded as a key method which is able to design the intelligent system of a brain. According to the basic principle, realization structure, and current development of the ADP method, Lewis and others gave a summary and prospect of the research and pointed out that ADP is an effective data-driven method [1215]. ADP can realize the optimization control of the nonlinear system by using neural networks based on online data and control information to approximate the performance index function of the optimal control law, without a mathematical model of a nonlinear control system [16]. To develop neural dynamic programming results and relax the system dynamic requirements, Zhong et al. proposed a new goal representation ADP online optimization control structure for nonlinear systems. But the network could not output the derivative function information of the cost function directly based on the implementation structure of HDP, and the control effect of the HDP structure needed improvement [17]. In fact, some studies show that DHP and GDHP can be controlled better than HDP in the structure of ADP method to some extent [18, 19].

In general, the study on the optimal control of nonlinear systems of ADP based on the traditional neural network has made great progress. But the ADP still has problems of slow response and poor stability. In this paper, ELM algorithm gives random input weights and thresholds to improve the response speed of the DHP algorithm and the stability of the DHP algorithm is improved by calculating the output weights by the regularization principle. In order to verify the validity of the algorithm, the ELM-DHP was designed to control the molecular distillation system with multivariability, nonlinearity, strong coupling, and large delay.

2. Algorithm Principle

2.1. DHP Algorithm Principle

The discrete-time nonlinear dynamic system is described as follows:

In formula (1), represents the state vector of the system, represents control variables, and represents the system function.

The performance index function (also called the cost function) corresponding to the system is where is the utility function, is the discount factor (), is the cost function of state , and depends on the initial time and the initial state . For DHP, the purpose of dynamic programming is to select a control sequence , which minimizes the function .

The DHP structure is shown in Figure 1, which contains three neural networks: model network, critic network, and action network. The neural network has a powerful function of universal approximation, so the model network can be used to model the unknown nonlinear or complex nonlinear system and make the DHP method widely used. The input of the critic network is a state variable. The output of the critic network is approximation performance index function J on the state x derivative, which is also known as the costate. The action network, also known as “Actor,” represents the mapping between system state variables and control variables [2022].

is based on the iteration of the derivative for performance index function and utility function to state.

In (3), is a feedback control variable, and costates and are the outputs of the critic network. If the weight of the critic network is set to , the right type of formula (1) is set to

At the same time, the left type of formula (1) can be written as . By adjusting the weights of the critic network, the least-mean-square-error function is as follows:

According to the principle of optimality, the optimal control should satisfy the first-order differential necessary condition.

So, the optimal quantity is obtained:

In formula (7), is the optimal costate, satisfying formula (5).

From (1) to (7), we can conclude that the optimal control quantity of the DHP method can be obtained directly by the costate. Compared with the HDP method which obtained the optimal control by the relationship between the weights () of the critic network and the input–output, the method of DHP has more computational efforts, but better control effect [9].

2.2. ELM Algorithm Principle

For a standard SLFN with hidden layer neurons learning arbitrary distinct samples , , and activation function are mathematically modeled as [23]where is the model output of the network, is the input weight matrix between the input layer neuron and the hidden layer neuron, is the output weight matrix between the hidden layer neuron and the output layer neurons, is the threshold of the neuron in the hidden layer, and is the inner product of and .

The learning objective of the SLFN is to minimize the output error. Error can be expressed as

The presence of , , and makes

So, (10) can be written as , where

is a hidden layer output matrix of ELM. So, the training of ELM is equivalent to the least-squares solution of linear system .

In (12), , (12) is equivalent to minimizing the loss function

Huang et al. [23] proved that the minimum value of the least-squares solution of the linear system satisfies the following.

(1) Minimum Training Error. The special solution is one of the least-squares solutions of a general linear system , which is a generalized inverse matrix of .

(2) Smallest Norm of Weights and Best Generalization Capability. Further, the special solution has the smallest norm among all of the least-squares solutions of , . The generalization ability of SLFN with minimum weight is independent of the number of parameters [24]. The smaller the weight, the stronger the generalization ability of SLFN.

(3) Special Solution. The least-squares solution of is unique.

2.3. Proof the Stability of ELM-DHP

The stability of the ELM-DHP algorithm is proved (i.e., the output error of the system is 0). The discrete nonlinear system is controlled by the ELM-DHP algorithm, and the three networks of the ELM-DHP algorithm are all based on the fixed ELM implementation. Therefore, it just needs to be proved that ELM can approximate the discrete nonlinear system by 0 error.

The ELM learning algorithm is chosen as a SLFN with hidden layer neurons. The arbitrary distinct samples , where , , and of the nonlinear discrete system and nonlinear activation function , are mathematically modeled as formula (8).

ELM learns a large number of samples generally, and the number of neurons in the hidden layer is far less than the number of samples, . So, we only need to prove that the learning error of ELM was 0 when . Huang et al. [7, 23, 25] proved in detail that the SLFN with neurons can approximate any arbitrary sample at any small error; that is,

The work above proves that the learning error of ELM is 0 (i.e., the stability of the ELM-DHP algorithm).

3. Implementation of the ELM-DHP Algorithm

The ELM-DHP algorithm includes three networks: model network, critic network, and action network. The hidden layer of the three networks is a sigmoidal bipolar function and the output layer is a purelin linear function. The realization process of the ELM-DHP algorithm is studied by using the discrete-time nonlinear dynamic programming of -dimensional state vector and -dimensional control vector as the research object.

3.1. Network Model

The model network adopts structure. The inputs are the components of the state vector in the moments and the components of the predicted output of the action network to state in the system of moments. The output is the components of the prediction vector to the state vector in the system of moments. The model network has hidden layer neurons. The structure of the model network is shown in Figure 2.

The model network is trained offline, and the calculation process is as follows.

The input layer to the hidden layer weight matrix and the hidden layer threshold matrix are randomly generated. Define the input vector and the expected output vector of the model network in moments:

Calculate the output matrix of the hidden layer in the model networkwhere is the input of the node in the model network hidden layer, is the output of the node in the model network hidden layer, and .

Calculate the weights from the hidden layer to the output layer:

According to the idea of the ELM, the error is minimized as

In equality (18), is the expected output output layer neurons of the model network.

is equivalent to solving the least-squares solution of the linear system :

The special solution of the weight matrix of the hidden layer and output layer in the model network is as follows:where is a generalized inverse matrix of in moments.

3.2. Critic Network

The critic network is composed of . The inputs are the components of the state vector , and the output is the estimation of the state , . is the number of hidden layer neurons in the critic network. In the critic network, the weight matrix from the input layer to the hidden layer, the weight matrix from the hidden layer to the output layer, and the hidden layer threshold matrix of time are, respectively, defined as , , . Figure 3 shows the structure of the critic network.

The critic network uses the least-squares method of ELM, whose forward calculation process is where is the input of the node in the critic network hidden layer, is the output of the node in the critic network hidden layer, , and is the output of the critic network output layer. The inputs of the critic network come from the output of the model network and the outputs of the critic network are costate function in the DHP. is expressed to the expected output of the critic network, which can be written as

The training error of DHP critic network is minimized based on the idea of ELM.where is the error of the critic network in moments and is the error of all the time points in the critic network.

According to the DHP structure and the definition of the expected outputs of the critic network, we can obtain

In formula (24), and represent the notion that and take the derivative of composite function .

Based on (23) and (24), we can acquire

According to , the weight from the hidden layer to the output layer is equal to the least-squares solution of the linear system , and hence we get :

In formula (26), is a generalized inverse matrix of . Based on the DHP structure and the chain rule [9], we can obtain

In (27), represents front lines of the weight matrix , and represents from to line of the weight matrix .

3.3. Action Network

The action network uses the structure of . inputs are the components of the state vector of the system at moments. outputs are the components of the control vector corresponding to the input state vector . represents the number of neurons in the action network hidden layer. and are, respectively, the weight matrix from the input layer to the hidden layer and the weight matrix from the hidden layer to the output layer in the action network. is the hidden layer threshold matrix of the action network. Figure 4 is the structure of the action network.

The calculation process of the action network is as follows:where is the input of the node and is the output of the th node in the action network hidden layer and . According to the idea of weight adjustment of ELM, the weight matrix from the hidden layer to the output layer is obtained:

In (30), is a generalized inverse matrix of and is the expected output of the action network. The weights of the network will be corrected if can be got. The inverse sigmoidal function is defined as . The calculation process of is as follows:

In (33), is the first rows of matrix .

Define ; we have

So, can be got:

3.4. Training Strategy

In this paper, the model network of the DHP algorithm is trained by an offline method at first to obtain the weight matrix of the model network. Then, the action network and the critic network are trained simultaneously. Training strategies are as follows:

(1) First, the model is trained by an offline method and the weight matrix of the model network is obtained.

(2) Taking into the action network, can be obtained.

(3) Taking and into the model network, will be obtained.

(4) Taking into the critic network, can be obtained.

(5) Calculate the expected output value of the critic network .

(6) Next, calculate the value of .

(7) Next, calculate and update the weights of the critic network.

(8) Last, make and go back to the second step until .

4. Simulation Analysis

4.1. Simulation Example Analysis

The molecular distillation technology was also called short films. When enough energy is obtained, the average free path that escapes from the surface of a liquid of light molecules differs from that of heavy molecules, which achieve the nonequilibrium liquid–liquid separation process under high vacuum conditions [26]. The molecular distillation technology has advantages of low temperature distillation, short heating time, and high separation efficiency, and it is conducive to separate the material, that is, high boiling point, heat sensitivity, and high viscosity material separation. This technology is widely used in food, medicine, oil processing, and petrochemical industry [2729]. Molecular distillation equipment can be divided into four types: stationary, falling film, scraped film, and centrifugal type [30]. At present, wiped film molecular distillation is the most widely used technology in scientific research and industrial production. The evaporation effect of the molecular distillation system is not only related to the size and shape of the evaporator and space, the distance to the surface evaporation condensation, the manufacturing process, and other types of equipment, but also connected with the pressure within the parameters of the feed flow rate, temperature of the evaporator, scraping, and other devices running the motor speed film process parameters [31]. In order to enhance the purification effect of molecular distillation, Wang et al. found that the head wave has an effect on the separation efficiency of molecular distillation by the study of the head wave [32]. Micov et al. studied the separation factors of the wiped film molecular distillation process and established a one-dimensional mathematical model [30]. Cvengros and Tkac established a mathematical equation which can be used to calculate the one-dimensional analysis mathematical equation of micro unit movement velocity in distillation equipment through the DSMC method and summarized the effects of evaporation temperature, distance, and vacuum degree and other related factors on the separation results [33]. Wu studied the simulation of the temperature, pressure, and reflux ratio on yield and purity by using the central response surface method combined with thin film evaporation and rectification coupling technology [34]. Although much research has been made, there are still many problems in molecular distillation system with multivariability, nonlinearity, strong coupling, and large delay. Therefore, the effectiveness of the ELM-DHP algorithm was verified by controlling the scraping film molecular distillation system.

The current state variables of the molecular distillation system are determined by the amount of state variables in the preceding section of the system and the control variables in the previous stage. So, distillation temperature, evaporation pressure, wiper motor speed, feeding speed, and Schisandra yield and purity of the front section were used as the input of the ELM-DHP controller, and the current Schisandra yield and purity were used as the output of the ELM-DHP controller.

4.2. Simulation Comparison

The structures of the model network, critic network, and action network were set as 6-20-2, 2-14-2, and 2-5-4 through experiment, respectively. In the process of system identification, the weight values of the three networks between the input layer and the hidden layer are selected in the range . 600 groups of data are collected to study, and 150 groups of data were used as the test set. Firstly, we need to train the model network offline; the least-squares solutions were calculated as the weight matrix between the hidden layer and the output layer. Then, we complete the training of the model network and keep its weight unchanged. The 50 time steps of the model network are shown in Figures 5 and 6.

Figures 5 and 6 show that the predicted values of the BP network and the ELM algorithm are in good agreement with the expected values. Figures 7 and 8 show that the maximum error of the BP network in the prediction of the state is 0.4, but the maximum error of ELM for the state prediction is about 0.06. Thus, it can be concluded that ELM has higher prediction accuracy and better generalization ability.

Parameter setting will affect the convergence speed of the algorithm to a certain extent. After the experiment, the discount factor was chosen as . Next, the weights of the critic network and the action network from the hidden layer to the output layer are calculated. Then, the training of the critic network and action network is set to 150 steps with 100 training epochs for each step.

In addition, in order to compare with the HDP and DHP technology based on BP neural network, controllers designed by BP-HDP, BP-DHP, and ELM-DHP were proposed. Four controllers are used to control the wiped film molecular distillation system, respectively, and the 50 time steps of the simulation results are shown in Figures 914. Figures 912 show that the control quantities of BP-HDP, BP-DHP, ELM-HDP, and ELM-DHP controllers achieve stable control in 45 steps, 35 steps, 18 steps, and 7 steps individually. Thus, it can be concluded that the HDP and DHP algorithms based on ELM can achieve faster response speed. There will be a larger fluctuation when the controlled variables of the HDP controller achieve stability. So, it can be concluded that the DHP algorithm has a higher stability. The results of Figures 13 and 14 are shown in Table 1. The purification effect increases with yield and purity and the best purification effect is 100%, but it is impossible to achieve. It can be seen in Table 1 that the optimal state quantities derived by ELM-HDP and ELM-DHP were 5% higher than BP-HDP and BP-DHP, and the optimal state of ELM-DHP is slightly higher than that of ELM-HDP. In the above analysis, the superiority and effectiveness of the ELM algorithm can be demonstrated clearly.

5. Summary

For those problems which the BP-DHP algorithm has, such as poor prediction accuracy, slow convergence speed, and poor stability, the ELM-DHP algorithm was studied in this paper to solve the data modeling and optimal control problem of the wiped film molecular distillation system with complex features such as multivariability, strong coupling, nonlinearity, and large time delay as an example. The ELM-DHP controller was designed to control the molecular distillation system and a simulation verification was carried out. When compared with the ELM-HDP, BP-HDP, and BP-DHP algorithms, the prediction accuracy of ELM is higher than that of the BP neural network, and the response speed and stability of the ELM- HDP and ELM-DHP algorithms are higher than those achieved by the BP network, which shows the superiority of ELM. Compared with other algorithms, the response speed of ELM-DHP is more than two times that of the other algorithms, and the optimal state achieved by ELM-DHP is closer to the ideal result. Thus, the ELM-DHP algorithm is better than BP-HDP, BP-DHP, and ELM-HDP algorithms. The ELM-DHP algorithm does not depend on the specific mechanism model and is only in accordance with the relevant experimental data, so the algorithm can also solve the optimal control problem of similar complex systems which have features such as multiple variables, strong coupling, nonlinearity, and large time delay.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61374138 (“Research on Fault Prediction and Optimal Maintenance of Complex Electromechanical System Based on Virtual Reality Technology”) by Changchun University of Technology.