Elsevier

Neurocomputing

Volume 19, Issues 1–3, 21 April 1998, Pages 199-209
Neurocomputing

Characterising complexity by the degrees of freedom in a radial basis function network

https://doi.org/10.1016/S0925-2312(97)00065-9Get rights and content

Abstract

In this paper we discuss an approach for characterising the complexity of a radial basis function network by estimating its effective degrees of freedom. We introduce a simple method for determining the degrees of freedom by exploiting a relationship to the theory of linear smoothers. Specifically, the complexity of the model is demonstrated theoretically and empirically to be determined by a spectral analysis of the space spanned by the outputs of the hidden layer.

Introduction

Complexity and intrinsic degrees of freedom of a neural network are related concepts which are difficult to define and assess. It has been common for researchers to use the number of hidden units, or the total number of adjustable weights in a network as a measure of network complexity. This is clearly naı̈ve. Is a network with ten hidden nodes but where each link is only allowed 4 bit weights more, or less, complex than a neural network with 4 hidden units but using 32 bit weights? Does it matter about the dynamic range of the hidden units, or whether Gaussian or spline basis functions are used in a radial basis function network? Is a multilayer perceptron with 4 hidden units and a total of 17 adjustable weights across input and output layers less complex than a radial basis function network with 16 fixed basis functions? Without a method to assess complexity, comparative performance experiments between different network models have limited significance. Unfortunately, the Bayesian view of marginalisation over unknown parameters is of little practical help in real world examples and we need techniques to estimate the complexity of finite data sets and attempt to match this complexity by networks with appropriate degrees of freedom.

The difficulty of the task is apparent in the situation when we may wish to compare the performances of, say, multilayer perceptrons and radial basis functions when they each have the same numbers of effective parameters. It is not clear if this concept of the ‘same numbers of effective parameters’ has any true meaning. The first layer of a radial basis function network tends not to be adaptive and hence the outputs of the hidden layer tend to be highly correlated. Whereas the effect of adapting the parameters of the multilayer perceptron is to introduce a tendency of decorrelating the hidden layer outputs. The consequence of this is that the radial basis function network uses more hidden nodes than the multilayer perceptron to ‘solve’ a given finite data problem. As a consequence, some researchers have attempted to orthogonalise the training and basis functions used by a radial basis function network in order to construct minimal networks (e.g. Ref. [2]).

The computational power of a neural network derives from the feature space defined by the hidden layer as a whole. For example, in classification problems it has been determined that the optimum feature space extracted by the hidden layer performs a specific type of discriminant analysis [8]. In this paper we are concerned primarily with radial basis function network and we choose to concentrate upon regression issues and introduce a measure of ‘structure characterisation’ which is an extension of the number of degrees of freedom of a simple linear model. The approach we adopt is to exploit the interpretation of the radial basis function network as a linear smooth 4, 7.

Section snippets

Kernel smoothers and dual basis functions

A ‘smoother’ is a nonparametric method employed to derive a summary of the trend of a response measurement driven by predictor measurements [4]. In this sense a smoother may be considered as a regression fit to an observational set of data. There are different classes of smoothers. The radial basis function network has a particularly strong relationship to that class of statistical regression models known as kernel smoothers.

Conclusions

In summary we have characterised the complexity of a radial basis function model by exploiting relationships to statistical methods in regression; specifically kernel smoothers. Since most kernel smoothers are equivalent to the solution of a penalised least-squares problem with an appropriately defined regulariser, the explicit action of regression smoothing by penalising overfitting is apparent. From the radial basis function perspective, the smoothing is governed by the properties of the

Acknowledgements

This work was supported under EPSRC contract K51792. The author would like to thank the referees for their assistance in pointing out how some of the confusing aspects of the paper could be clarified.

David Lowe has held a Chair in Neural Computing at Aston University, UK since 1993. He has a Ph.D. in quantum transport theory, and previously worked for the UK Defence Research Agency in areas such as automatic speech recognition and the principles of statistical pattern analysis. Although known for introducing the Radial Basis Function network into the neural network domain in 1988 his main research activity in life is the study of the learning process in his six children.

References (8)

There are more references available in the full text version of this article.

Cited by (11)

  • An optimized probabilistic neural network with unit hyperspherical crown mapping and adaptive kernel coverage

    2020, Neurocomputing
    Citation Excerpt :

    Pattern classification is an important application area of artificial neural networks. In recent years, many neural network models and their improvement methods have been proposed for various classification tasks [1–10]. Examples include, extreme learning machines (ELM) [1], SaE-ELM [2], the RBF algorithm with k-means clustering (KMRBF) [3], RBF-BP algorithm with k-means clustering(KMRBF-BP) [4], convolutional neural network (CNN) [5], GCNN [6], growing and pruning of RBF(GAP-RBF) [7], incremental learning RBF-BP(ILRBF-BP) [8], probabilistic neural network (PNN) [9] and IPNN [10], among others.

  • Fuzzy clustering with supervision

    2004, Pattern Recognition
  • An incremental learning algorithm for the hybrid RBF-BP network classifier

    2016, Eurasip Journal on Advances in Signal Processing
View all citing articles on Scopus

  1. Download : Download full-size image
David Lowe has held a Chair in Neural Computing at Aston University, UK since 1993. He has a Ph.D. in quantum transport theory, and previously worked for the UK Defence Research Agency in areas such as automatic speech recognition and the principles of statistical pattern analysis. Although known for introducing the Radial Basis Function network into the neural network domain in 1988 his main research activity in life is the study of the learning process in his six children.

View full text