Characterising complexity by the degrees of freedom in a radial basis function network
Introduction
Complexity and intrinsic degrees of freedom of a neural network are related concepts which are difficult to define and assess. It has been common for researchers to use the number of hidden units, or the total number of adjustable weights in a network as a measure of network complexity. This is clearly naı̈ve. Is a network with ten hidden nodes but where each link is only allowed 4 bit weights more, or less, complex than a neural network with 4 hidden units but using 32 bit weights? Does it matter about the dynamic range of the hidden units, or whether Gaussian or spline basis functions are used in a radial basis function network? Is a multilayer perceptron with 4 hidden units and a total of 17 adjustable weights across input and output layers less complex than a radial basis function network with 16 fixed basis functions? Without a method to assess complexity, comparative performance experiments between different network models have limited significance. Unfortunately, the Bayesian view of marginalisation over unknown parameters is of little practical help in real world examples and we need techniques to estimate the complexity of finite data sets and attempt to match this complexity by networks with appropriate degrees of freedom.
The difficulty of the task is apparent in the situation when we may wish to compare the performances of, say, multilayer perceptrons and radial basis functions when they each have the same numbers of effective parameters. It is not clear if this concept of the ‘same numbers of effective parameters’ has any true meaning. The first layer of a radial basis function network tends not to be adaptive and hence the outputs of the hidden layer tend to be highly correlated. Whereas the effect of adapting the parameters of the multilayer perceptron is to introduce a tendency of decorrelating the hidden layer outputs. The consequence of this is that the radial basis function network uses more hidden nodes than the multilayer perceptron to ‘solve’ a given finite data problem. As a consequence, some researchers have attempted to orthogonalise the training and basis functions used by a radial basis function network in order to construct minimal networks (e.g. Ref. [2]).
The computational power of a neural network derives from the feature space defined by the hidden layer as a whole. For example, in classification problems it has been determined that the optimum feature space extracted by the hidden layer performs a specific type of discriminant analysis [8]. In this paper we are concerned primarily with radial basis function network and we choose to concentrate upon regression issues and introduce a measure of ‘structure characterisation’ which is an extension of the number of degrees of freedom of a simple linear model. The approach we adopt is to exploit the interpretation of the radial basis function network as a linear smooth 4, 7.
Section snippets
Kernel smoothers and dual basis functions
A ‘smoother’ is a nonparametric method employed to derive a summary of the trend of a response measurement driven by predictor measurements [4]. In this sense a smoother may be considered as a regression fit to an observational set of data. There are different classes of smoothers. The radial basis function network has a particularly strong relationship to that class of statistical regression models known as kernel smoothers.
Conclusions
In summary we have characterised the complexity of a radial basis function model by exploiting relationships to statistical methods in regression; specifically kernel smoothers. Since most kernel smoothers are equivalent to the solution of a penalised least-squares problem with an appropriately defined regulariser, the explicit action of regression smoothing by penalising overfitting is apparent. From the radial basis function perspective, the smoothing is governed by the properties of the
Acknowledgements
This work was supported under EPSRC contract K51792. The author would like to thank the referees for their assistance in pointing out how some of the confusing aspects of the paper could be clarified.
David Lowe has held a Chair in Neural Computing at Aston University, UK since 1993. He has a Ph.D. in quantum transport theory, and previously worked for the UK Defence Research Agency in areas such as automatic speech recognition and the principles of statistical pattern analysis. Although known for introducing the Radial Basis Function network into the neural network domain in 1988 his main research activity in life is the study of the learning process in his six children.
References (8)
- et al.
Extracting qualitative dynamics from experimental data
Physica D
(1986) - et al.
The Optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis
Neural Networks
(1990) - et al.
Orthogonal least squares algorithm for training multioutput radial basis function networks
IEE Proc. F
(1992) - et al.
Regularization theory and neural networks architectures
Neural Computation
(1995)
Cited by (11)
An optimized probabilistic neural network with unit hyperspherical crown mapping and adaptive kernel coverage
2020, NeurocomputingCitation Excerpt :Pattern classification is an important application area of artificial neural networks. In recent years, many neural network models and their improvement methods have been proposed for various classification tasks [1–10]. Examples include, extreme learning machines (ELM) [1], SaE-ELM [2], the RBF algorithm with k-means clustering (KMRBF) [3], RBF-BP algorithm with k-means clustering(KMRBF-BP) [4], convolutional neural network (CNN) [5], GCNN [6], growing and pruning of RBF(GAP-RBF) [7], incremental learning RBF-BP(ILRBF-BP) [8], probabilistic neural network (PNN) [9] and IPNN [10], among others.
Improved GAP-RBF network for classification problems
2007, NeurocomputingFuzzy clustering with supervision
2004, Pattern RecognitionExtracting multisource brain activity from a single electromagnetic channel
2003, Artificial Intelligence in MedicineAn incremental learning algorithm for the hybrid RBF-BP network classifier
2016, Eurasip Journal on Advances in Signal Processing
David Lowe has held a Chair in Neural Computing at Aston University, UK since 1993. He has a Ph.D. in quantum transport theory, and previously worked for the UK Defence Research Agency in areas such as automatic speech recognition and the principles of statistical pattern analysis. Although known for introducing the Radial Basis Function network into the neural network domain in 1988 his main research activity in life is the study of the learning process in his six children.