Elsevier

Neural Networks

Volume 17, Issue 1, January 2004, Pages 113-126
Neural Networks

Practical selection of SVM parameters and noise estimation for SVM regression

https://doi.org/10.1016/S0893-6080(03)00169-2Get rights and content

Abstract

We investigate practical selection of hyper-parameters for support vector machines (SVM) regression (that is, ε-insensitive zone and regularization parameter C). The proposed methodology advocates analytic parameter selection directly from the training data, rather than re-sampling approaches commonly used in SVM applications. In particular, we describe a new analytical prescription for setting the value of insensitive zone ε, as a function of training sample size. Good generalization performance of the proposed parameter selection is demonstrated empirically using several low- and high-dimensional regression problems. Further, we point out the importance of Vapnik's ε-insensitive loss for regression problems with finite samples. To this end, we compare generalization performance of SVM regression (using proposed selection of ε-values) with regression using ‘least-modulus’ loss (ε=0) and standard squared loss. These comparisons indicate superior generalization performance of SVM regression under sparse sample settings, for various types of additive noise.

Introduction

This study is motivated by a growing popularity of support vector machines (SVM) for regression problems (Cherkassky and Mulier, 1998, Drucker et al., 1997, Kwok, 2001, Mattera and Haykin, 1999, Muller et al., 1999, Schölkopf et al., 1998, Schölkopf et al., 1999, Schölkopf and Smola, 2002, Smola et al., 1998, Smola and Schölkopf, 1998, Vapnik, 1998, Vapnik, 1999). Their practical success can be attributed to solid theoretical foundations based on VC-theory (Vapnik, 1998, Vapnik, 1999), since SVM generalization performance does not depend on the dimensionality of the input space. However, many SVM regression application studies are performed by ‘expert’ users. Since the quality of SVM models depends on a proper setting of SVM hyper-parameters, the main issue for practitioners trying to apply SVM regression is how to set these parameter values (to ensure good generalization performance) for a given data set. Whereas existing sources on SVM regression (Cherkassky and Mulier, 1998, Kwok, 2001, Mattera and Haykin, 1999, Muller et al., 1999, Schölkopf et al., 1998, Schölkopf et al., 1999, Smola et al., 1998, Smola and Schölkopf, 1998, Vapnik, 1998, Vapnik, 1999) give some recommendations on appropriate setting of SVM parameters, there is no general consensus and many contradictory opinions. Hence, re-sampling remains the method of choice for many applications. Unfortunately, using re-sampling for (simultaneously) tuning several SVM regression parameters is very expensive in terms of computational costs and data requirements.

This paper describes simple yet practical analytical approach to SVM regression parameter setting directly from the training data. Proposed approach (to parameter selection) is based on well-known theoretical understanding of SVM regression that provides the basic analytical form of proposed (analytical) prescriptions for parameter selection. Further, we perform empirical tuning of these analytical dependencies using synthetic data sets. Practical validity of the proposed approach is demonstrated using several low- and high-dimensional regression problems.

Recently, several researchers (Smola and Schölkopf, 1998, Vapnik, 1998, Vapnik, 1999) noted the similarity between Vapnik's ε-insensitive loss function and Huber's loss in robust statistics (Huber, 1964). In particular, Vapnik's loss function coincides with a special form of Huber's loss aka least-modulus (LM) loss (with ε=0). From the viewpoint of traditional robust statistics, there is a well-known correspondence between the noise model and optimal loss function (Schölkopf and Smola, 2002, Smola and Schölkopf, 1998). However, this connection between the noise model and the loss function is based on (asymptotic) maximum likelihood arguments (Smola & Schölkopf, 1998). It can be argued that for finite-sample regression problems Vapnik's ε-insensitive loss (with properly chosen ε-value) may yield better generalization than other loss functions (known to be asymptotically optimal for a particular noise density). In order to test this assertion, we compare generalization performance of SVM linear regression (with optimally chosen ε) with robust regression using LM loss function (ε=0) and also with optimal least squares regression, for several noise densities.

This paper is organized as follows. Section 2 gives a brief introduction to SVM regression and reviews existing methods for SVM parameter selection. Section 3 describes the proposed approach for selecting SVM parameters. Section 4 presents empirical comparisons. These comparisons include regression data sets with non-linear target functions, corrupted with Gaussian noise, as well as non-Gaussian noise. Section 5 presents extensive empirical comparisons for higher dimensional linear regression problems under different settings and noise models. Section 6 describes noise variance estimation for SVM regression. Finally, summary and discussion are given in Section 7.

Section snippets

Support vector regression and SVM parameter selection

We consider standard regression formulation under general setting for predictive learning (Cherkassky and Mulier, 1998, Hastie et al., 2001; Vapnik, 1999). The goal is to estimate unknown real-valued function in the relationship:y=r(x)+δwhere δ is independent and identically distributed (i.i.d.) zero mean random error (noise), x is a multivariate input and y is a scalar output. The estimation is made based on a finite number of samples (training data): (xi,yi), (i=1,…,n). The training data are

Proposed approach for parameter selection

Selection of parameter C. Following Mattera and Haykin (1999), consider standard parameterization of SVM solution given by Eq. (11), assuming that the ε-insensitive zone parameter has been (somehow) chosen. Also suppose, without loss of generality, that the SVM kernel function is bounded in the input domain. For example, RBF kernels (used in empirical comparisons presented later in Section 4) satisfy this assumption:K(xi,x)=expxxi22p2where p is the width parameter.

Under these assumptions,

Experimental results for non-linear target functions

This section presents empirical comparisons for non-linear regression, first with Gaussian noise, and then with non-Gaussian noise.

Empirical results for linear regression

In this section we present empirical comparisons for several linear regression estimators using three representative loss functions: squared loss, LM and ε-insensitive loss with selection of ε given by Eq. (17). Our goal is to investigate the effect of a loss function on the prediction accuracy of linear regression with finite samples. Even though SVM regression has been extensively used for regression applications (Schölkopf et al., 1999), its success is mainly due to remarkable ability of SVM

Noise variance estimation

The proposed method for selecting ε relies on the knowledge of the standard deviation of noise σ. The problem, of course, is that the noise variance is not known a priori, and it needs to be estimated from training data (xi,yi), (i=1,…,n).

In practice, the noise variance can be readily estimated from the squared sum of residuals (fitting error) of the training data. Namely, the well-known approach of estimating noise variance (for linear models) is by fitting the data using low bias

Summary and discussion

This paper describes practical recommendations for setting meta-parameters for SVM regression. Namely the values of ε and C parameters are obtained directly from the training data and (estimated) noise level. Extensive empirical comparisons suggest that the proposed parameter selection yields good generalization performance of SVM estimates under different noise levels, types of noise, target functions and sample sizes. Hence, the proposed approach for SVM parameter selection can be immediately

Acknowledgements

The authors thank Dr V. Vapnik for many useful discussions. This work was supported, in part, by NSF grant ECS-0099906.

References (18)

  • O Chapelle et al.

    Model selection for support vector machines

    (1999)
  • V Cherkassky et al.

    Comparison of model selection for regression

    Neural Computation

    (2003)
  • V Cherkassky et al.

    Learning from data: Concepts, theory, and methods

    (1998)
  • V Cherkassky et al.

    Model complexity control for regression using VC generalization bounds

    IEEE Transaction on Neural Networks

    (1999)
  • H Drucker et al.

    Support vector regression machines

  • T Hastie et al.

    The elements of statistical learning: Data mining, inference and prediction

    (2001)
  • P Huber

    Robust estimation of a location parameter

    Annals of Mathematical Statistics

    (1964)
  • J.T Kwok

    Linear dependency between ε and the input noise in ε-support vector regression

  • D Mattera et al.

    Support vector machines for dynamic reconstruction of a chaotic system

There are more references available in the full text version of this article.

Cited by (1810)

View all citing articles on Scopus
View full text