Numerical bounds to assure initial local stability of NARX multilayer perceptrons and radial basis functions
Introduction
One of the key steps in the use of a neural network is the learning process. With the exception of some very particular neural architectures, this process is usually an iterative modification of the network parameters (weights), starting by a set of values called the initial weights. There exist numerous works [9], [10], [11], [14], [15], [18], [19], [22], [23], [24] demonstrating that an adequate selection of these initial weights is essential if a good convergence in the learning process must be achieved.
This is even more crucial in the case of recurrent neural networks. These networks constitute dynamical systems themselves, so they are concerned with stability issues. If weights are not properly initialized, unstable behaviour can result at the initial stages, thus complicating the subsequent learning process. However, the initial selection of weights is commonly performed randomly, with the confidence that if the weights are sufficiently small, the output of the network will behave adequately (see, for example, [2]). One of the goals of this work is to mathematically justify this rule of the thumb, and analytically determine the bounds for those sufficiently small weights that assure an initial good behaviour of the network. This in turn helps to avoid poor training performance due to excessively small initial weights.
In [1], sufficient conditions on the initial values of the weight matrices for a general recurrent neural network to guarantee existence of a unique stable equilibrium point in the state space of the network dynamics are presented. However, the dynamic of the networks considered is only of first-order and in continuous-time, due to only first-order derivatives of the network output are taken into account on its dynamical evolution. In this work, the more general NARX architectures have been studied, and higher-order discrete-time dynamics have been taken into account.
To do this, the open-loop local stability of recurrent NARX multilayer perceptrons and radial basis functions neural networks is studied in relation with the value of their weights. It will be shown that imposing some non-restrictive conditions on the modulus of the weights, local stability (stability of the linear system obtained by linearizing the network at a working point) of the network can be assured in the beginning of the learning phase for all the possible working points. Local stability does not necessarily imply global stability, but it guarantees that the output of the neurons will not quickly saturate as the network evolves in time. This is especially important when working with learning algorithms that take into account this evolution, as saturation in a neuron means that all the weights below it will not learn because the error will not be backpropagated.
These conditions on the weights allow easily adapting some of the advantageous initialization criteria, like the Nguyen–Widrow method [18], that are commonly used with non-recurrent backpropagation networks.
To statistically show the accuracy of the proposed boundaries and the advantages of their use, some experiments employing several network structures have been carried out, whose results will be shown along the paper.
Section snippets
Stability study
For the sake of simplicity, only SISO (Single Input–Single Output) network structures are considered. The extension to MIMO (Multiple Input–Multiple Output) systems is straightforward.
Let us consider a general nonlinear autoregressive with exogenous inputs network [3], [4], [5], [6], [7], [8], [16], [17], [21]. In this structure (Fig. 1), the output of the network is delayed by Time Delay Lines (TDL) and fed again into the network's input. As seen in Fig. 1, the input to the neural network are
Multilayer perceptrons with sigmoid-like neurons
Let us consider a neuron whose output is given by:where
Then, if small variations are considered on input X, the corresponding variations in a are given by:as Δxn+1=0.
In the case of monotonically increasing functions (as the sigmoid or hyperbolic tangent activation functions), function f′ accomplishes that:where, for example, β=0.25 in the case of logistic sigmoids, while β=1
Radial basis functions
Let us consider that y is the output of the network, and that si is the output of the ith neuron in the Gaussian layer. Then:where the L superscript denotes the weights and bias belong to the linear layer. If the output of the nonlinear neurons is linealized around a working point :
According to (4), (31), stability can be assured if:for all j
Experimental results
In order to test the proposed boundaries and extract statistically significant conclusions on the convenience of their use, several experiments have been carried out with NARX MLP and RBF networks. In these tests, the two main uses of recurrent neural networks (nonlinear controllers and nonlinear identifiers) have been considered.
Conclusions and future work
In this work, it is demonstrated that a quantitative limit for the modulus of the initial weights on NARX MLP and RBF neural networks is sufficient to assure local stability on the beginning of the learning stage. Also, these limits have been quantitatively deduced for general network structures. The use of the proposed limits allows avoiding saturations and even overflows in the critical first steps of training, which substantially reduce its performance. Experimental evidence of the benefits
Acknowledgements
Miguel Pinzolas wishes to thank to the Spanish Ministerio de Ciencia y Tecnología for its economical support under project SUYCON: DPI2000-04150-P403.
Eloy Irigoyen wishes to thank to the University of the Basque Country for its economical support under project 1/UPV 00146.345.T-15282/2003.
Eloy Irigoyen Gordo, MEng, PhD, is associate lecturer of system engineering and automation at the University of the Basque Country since 2001. He earned his MEng degree in electrical engineering from the University of the Basque Country (1992) and his PhD in industrial engineering from the Public University of Navarre (2003). He has been assistant lecturer at the Public University of Navarre, and at the University of Deusto. Also, he has been visiting researcher at the University of Reading,
References (24)
- et al.
Initialization of neural networks by means of decision trees
Knowl. Based Syst.
(1995) - et al.
A weight initialization method for improving training speed in feedforward neural network
Neurocomputing
(2000) Learning on a general network
Neural Inf. Process. Syst.
(1988)New results on recurrent network training: unifying the algorithms and accelerating convergence
IEEE Trans. Neural Netw.
(2000)Radial basis functions for signal prediction and system modelling
J. Appl. Sci. Comput.
(1994)Nonlinear time series modeling and prediction using Gaussian RBF networks with enhanced clustering and RLS learning
Electron. Lett.
(1995)- et al.
Representations of non-linear systems: the NARMAX model
Int. J. Control
(1989) - et al.
Neural networks for nonlinear dynamic system modelling and identification
Int. J. Control
(1992) - et al.
Non-linear system identification using neural networks
Int. J. Control
(1990) - et al.
Gradient radial basis function networks for nonlinear and nonstationary time series prediction
IEEE Trans. Neural Netw.
(1996)
Initializing backpropagation networks with prototypes
Neural Netw.
Cited by (11)
Data-driven fault-tolerant control for SISO nonlinear system with unknown sensor fault
2023, International Journal of Robust and Nonlinear ControlIdentification of Unstable Glacier Flow in the Western Tibetan Plateau and Karakoram Using Machine Learning
2022, Journal of Geophysical Research: Earth Surface
Eloy Irigoyen Gordo, MEng, PhD, is associate lecturer of system engineering and automation at the University of the Basque Country since 2001. He earned his MEng degree in electrical engineering from the University of the Basque Country (1992) and his PhD in industrial engineering from the Public University of Navarre (2003). He has been assistant lecturer at the Public University of Navarre, and at the University of Deusto. Also, he has been visiting researcher at the University of Reading, and at the Polytechnic University of Madrid. His main researching interests are related with neural network learning and applications, intelligent control, and with computer vision.
Miguel Pinzolas-Prado, MSc, PhD, is senior lecturer of system engineering and automation at the Technical University of Cartagena since 1999. He earned his MSc degree in physics from the University of Saragossa (1992) and his PhD in industrial engineering from the Public University of Navarre (1997). He has been assistant lecturer at the Public University of Navarre and visiting researcher at the University of Reading. His main researching interests are related with neural network learning and applications and with computer vision.