Characterising complexity by the degrees of freedom in a radial basis function network

doi:10.1016/S0925-2312(97)00065-9

Neurocomputing

Volume 19, Issues 1–3, 21 April 1998, Pages 199-209

https://doi.org/10.1016/S0925-2312(97)00065-9 Get rights and content

Abstract

In this paper we discuss an approach for characterising the complexity of a radial basis function network by estimating its effective degrees of freedom. We introduce a simple method for determining the degrees of freedom by exploiting a relationship to the theory of linear smoothers. Specifically, the complexity of the model is demonstrated theoretically and empirically to be determined by a spectral analysis of the space spanned by the outputs of the hidden layer.

Introduction

Complexity and intrinsic degrees of freedom of a neural network are related concepts which are difficult to define and assess. It has been common for researchers to use the number of hidden units, or the total number of adjustable weights in a network as a measure of network complexity. This is clearly naı̈ve. Is a network with ten hidden nodes but where each link is only allowed 4 bit weights more, or less, complex than a neural network with 4 hidden units but using 32 bit weights? Does it matter about the dynamic range of the hidden units, or whether Gaussian or spline basis functions are used in a radial basis function network? Is a multilayer perceptron with 4 hidden units and a total of 17 adjustable weights across input and output layers less complex than a radial basis function network with 16 fixed basis functions? Without a method to assess complexity, comparative performance experiments between different network models have limited significance. Unfortunately, the Bayesian view of marginalisation over unknown parameters is of little practical help in real world examples and we need techniques to estimate the complexity of finite data sets and attempt to match this complexity by networks with appropriate degrees of freedom.

The difficulty of the task is apparent in the situation when we may wish to compare the performances of, say, multilayer perceptrons and radial basis functions when they each have the same numbers of effective parameters. It is not clear if this concept of the ‘same numbers of effective parameters’ has any true meaning. The first layer of a radial basis function network tends not to be adaptive and hence the outputs of the hidden layer tend to be highly correlated. Whereas the effect of adapting the parameters of the multilayer perceptron is to introduce a tendency of decorrelating the hidden layer outputs. The consequence of this is that the radial basis function network uses more hidden nodes than the multilayer perceptron to ‘solve’ a given finite data problem. As a consequence, some researchers have attempted to orthogonalise the training and basis functions used by a radial basis function network in order to construct minimal networks (e.g. Ref. [2]).

The computational power of a neural network derives from the feature space defined by the hidden layer as a whole. For example, in classification problems it has been determined that the optimum feature space extracted by the hidden layer performs a specific type of discriminant analysis [8]. In this paper we are concerned primarily with radial basis function network and we choose to concentrate upon regression issues and introduce a measure of ‘structure characterisation’ which is an extension of the number of degrees of freedom of a simple linear model. The approach we adopt is to exploit the interpretation of the radial basis function network as a linear smooth 4, 7.

Section snippets

Kernel smoothers and dual basis functions

A ‘smoother’ is a nonparametric method employed to derive a summary of the trend of a response measurement driven by predictor measurements [4]. In this sense a smoother may be considered as a regression fit to an observational set of data. There are different classes of smoothers. The radial basis function network has a particularly strong relationship to that class of statistical regression models known as kernel smoothers.

Conclusions

In summary we have characterised the complexity of a radial basis function model by exploiting relationships to statistical methods in regression; specifically kernel smoothers. Since most kernel smoothers are equivalent to the solution of a penalised least-squares problem with an appropriately defined regulariser, the explicit action of regression smoothing by penalising overfitting is apparent. From the radial basis function perspective, the smoothing is governed by the properties of the

Acknowledgements

This work was supported under EPSRC contract K51792. The author would like to thank the referees for their assistance in pointing out how some of the confusing aspects of the paper could be clarified.

David Lowe has held a Chair in Neural Computing at Aston University, UK since 1993. He has a Ph.D. in quantum transport theory, and previously worked for the UK Defence Research Agency in areas such as automatic speech recognition and the principles of statistical pattern analysis. Although known for introducing the Radial Basis Function network into the neural network domain in 1988 his main research activity in life is the study of the learning process in his six children.

References (8)

D.S. Broomhead et al.
Extracting qualitative dynamics from experimental data
Physica D
(1986)
A.R. Webb et al.
The Optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis
Neural Networks
(1990)
S. Chen et al.
Orthogonal least squares algorithm for training multioutput radial basis function networks
IEE Proc. F
(1992)
F. Girosi et al.
Regularization theory and neural networks architectures
Neural Computation
(1995)

There are more references available in the full text version of this article.

Cited by (11)

An optimized probabilistic neural network with unit hyperspherical crown mapping and adaptive kernel coverage
2020, Neurocomputing
Citation Excerpt :
Pattern classification is an important application area of artificial neural networks. In recent years, many neural network models and their improvement methods have been proposed for various classification tasks [1–10]. Examples include, extreme learning machines (ELM) [1], SaE-ELM [2], the RBF algorithm with k-means clustering (KMRBF) [3], RBF-BP algorithm with k-means clustering(KMRBF-BP) [4], convolutional neural network (CNN) [5], GCNN [6], growing and pruning of RBF(GAP-RBF) [7], incremental learning RBF-BP(ILRBF-BP) [8], probabilistic neural network (PNN) [9] and IPNN [10], among others.
It is important to improve the classification accuracy and reduce the storage space when probabilistic neural networks are used for pattern classification tasks. Based on a unit hyperspherical crown mapping and adaptive kernel coverage strategy, this paper presents an optimized hyperspherical crown probabilistic neural network(HCPNN). To overcome the separability problem caused by the fusion of heterogeneous samples, we adopt an unconventional unit hyperspherical crown mapping model in the sample space. Theoretical analysis indicates that nonlinear mapping can improve the separability of the original sample set under certain conditions. In addition, to optimize the pattern layer structure of probabilistic neural networks, we adopt an adaptive kernel coverage method for the training sample space to generate initial pattern nodes. The accumulation potential of the sample in each training subclass is used to measure the distribution density of different classes, and an adaptive update mechanism of potential values is established. In each iteration, nodes with high accumulation potential values are searched as pattern nodes from the dense to sparse regions. The precise position of each pattern node and the corresponding kernel width are adjusted by the Expected Maximum algorithm. Experiments show that HCPNN outperforms other algorithms with respect to the classification performance.
Improved GAP-RBF network for classification problems
2007, Neurocomputing
This paper presents the performance evaluation of the recently developed Growing and Pruning Radial Basis Function (GAP-RBF) algorithm for classification problems. Earlier GAP-RBF was evaluated only for function approximation problems. Improvements to GAP-RBF for enhancing its performance in both accuracy and speed are also described and the resulting algorithm is referred to as Fast GAP-RBF (FGAP-RBF). Performance comparison of FGAP-RBF algorithm with GAP-RBF and the Minimal Resource Allocation Network (MRAN) algorithm based on four benchmark classification problems, viz. Phoneme, Segment, Satimage and DNA are presented. The results indicate that FGAP-RBF produces higher classification accuracy with reduced computational complexity.
Fuzzy clustering with supervision
2004, Pattern Recognition
This study is concerned with clustering carried out in presence of labeled patterns. An objective of this optimization is to reconcile between the structure residing in data (and being primarily discovered by the underlying clustering mechanism) and the labels of the patterns forming such structure. In this sense, one can consider the supervised fuzzy clustering to be a framework of preliminary data analysis providing with a thorough insight into the structure of the data and supporting the ensuing design of detailed classifiers. The proposed method augments the standard fuzzy C-means algorithm by extending the original objective function by the supervision component (labeled patterns). Experimental results illustrate the approach and discuss the use of this type of clustering in vector quantization.
Extracting multisource brain activity from a single electromagnetic channel
2003, Artificial Intelligence in Medicine
This paper develops a methodology for the extraction of multisource brain activity using only single channel recordings of electromagnetic (EM) brain signals. Measured electroencephalogram (EEG) and magnetoencephalogram (MEG) signals are used to demonstrate the utility of the method on extracting multisource activity from a single channel recording. At the heart of the method is dynamical embedding (DE) where first an appropriate embedding matrix is constructed out of a series of delay vectors from the measured signal. The embedding matrix contains the information we require, but in a mixed form which therefore needs to be deconstructed. In particular, we demonstrate how one form of independent component analysis (ICA) performed on the embedding matrix can deconstruct the single channel recording into its underlying informative components. The components are treated as a convenient expansion basis and subjective methods are then used to identify components of interest relevant to the application. The framework has been applied to single channels of both EEG and MEG recordings and is shown to isolate multiple sources of activity which includes: (i) artifactual components such as ocular, electrocardiographic and electrode artefact, (ii) seizure components in epileptic EEG recordings, and (iii) theta band, tumour related, activity in MEG recordings. The results are intuitive and meaningful in a neurophysiological setting.
War without Oversight: Challenges to the Deployment of Autonomous Weapons
2021, SSRN
An incremental learning algorithm for the hybrid RBF-BP network classifier
2016, Eurasip Journal on Advances in Signal Processing