Practical selection of SVM parameters and noise estimation for SVM regression
Introduction
This study is motivated by a growing popularity of support vector machines (SVM) for regression problems (Cherkassky and Mulier, 1998, Drucker et al., 1997, Kwok, 2001, Mattera and Haykin, 1999, Muller et al., 1999, Schölkopf et al., 1998, Schölkopf et al., 1999, Schölkopf and Smola, 2002, Smola et al., 1998, Smola and Schölkopf, 1998, Vapnik, 1998, Vapnik, 1999). Their practical success can be attributed to solid theoretical foundations based on VC-theory (Vapnik, 1998, Vapnik, 1999), since SVM generalization performance does not depend on the dimensionality of the input space. However, many SVM regression application studies are performed by ‘expert’ users. Since the quality of SVM models depends on a proper setting of SVM hyper-parameters, the main issue for practitioners trying to apply SVM regression is how to set these parameter values (to ensure good generalization performance) for a given data set. Whereas existing sources on SVM regression (Cherkassky and Mulier, 1998, Kwok, 2001, Mattera and Haykin, 1999, Muller et al., 1999, Schölkopf et al., 1998, Schölkopf et al., 1999, Smola et al., 1998, Smola and Schölkopf, 1998, Vapnik, 1998, Vapnik, 1999) give some recommendations on appropriate setting of SVM parameters, there is no general consensus and many contradictory opinions. Hence, re-sampling remains the method of choice for many applications. Unfortunately, using re-sampling for (simultaneously) tuning several SVM regression parameters is very expensive in terms of computational costs and data requirements.
This paper describes simple yet practical analytical approach to SVM regression parameter setting directly from the training data. Proposed approach (to parameter selection) is based on well-known theoretical understanding of SVM regression that provides the basic analytical form of proposed (analytical) prescriptions for parameter selection. Further, we perform empirical tuning of these analytical dependencies using synthetic data sets. Practical validity of the proposed approach is demonstrated using several low- and high-dimensional regression problems.
Recently, several researchers (Smola and Schölkopf, 1998, Vapnik, 1998, Vapnik, 1999) noted the similarity between Vapnik's ε-insensitive loss function and Huber's loss in robust statistics (Huber, 1964). In particular, Vapnik's loss function coincides with a special form of Huber's loss aka least-modulus (LM) loss (with ε=0). From the viewpoint of traditional robust statistics, there is a well-known correspondence between the noise model and optimal loss function (Schölkopf and Smola, 2002, Smola and Schölkopf, 1998). However, this connection between the noise model and the loss function is based on (asymptotic) maximum likelihood arguments (Smola & Schölkopf, 1998). It can be argued that for finite-sample regression problems Vapnik's ε-insensitive loss (with properly chosen ε-value) may yield better generalization than other loss functions (known to be asymptotically optimal for a particular noise density). In order to test this assertion, we compare generalization performance of SVM linear regression (with optimally chosen ε) with robust regression using LM loss function (ε=0) and also with optimal least squares regression, for several noise densities.
This paper is organized as follows. Section 2 gives a brief introduction to SVM regression and reviews existing methods for SVM parameter selection. Section 3 describes the proposed approach for selecting SVM parameters. Section 4 presents empirical comparisons. These comparisons include regression data sets with non-linear target functions, corrupted with Gaussian noise, as well as non-Gaussian noise. Section 5 presents extensive empirical comparisons for higher dimensional linear regression problems under different settings and noise models. Section 6 describes noise variance estimation for SVM regression. Finally, summary and discussion are given in Section 7.
Section snippets
Support vector regression and SVM parameter selection
We consider standard regression formulation under general setting for predictive learning (Cherkassky and Mulier, 1998, Hastie et al., 2001; Vapnik, 1999). The goal is to estimate unknown real-valued function in the relationship:where δ is independent and identically distributed (i.i.d.) zero mean random error (noise), is a multivariate input and y is a scalar output. The estimation is made based on a finite number of samples (training data): (i=1,…,n). The training data are
Proposed approach for parameter selection
Selection of parameter C. Following Mattera and Haykin (1999), consider standard parameterization of SVM solution given by Eq. (11), assuming that the ε-insensitive zone parameter has been (somehow) chosen. Also suppose, without loss of generality, that the SVM kernel function is bounded in the input domain. For example, RBF kernels (used in empirical comparisons presented later in Section 4) satisfy this assumption:where p is the width parameter.
Under these assumptions,
Experimental results for non-linear target functions
This section presents empirical comparisons for non-linear regression, first with Gaussian noise, and then with non-Gaussian noise.
Empirical results for linear regression
In this section we present empirical comparisons for several linear regression estimators using three representative loss functions: squared loss, LM and ε-insensitive loss with selection of ε given by Eq. (17). Our goal is to investigate the effect of a loss function on the prediction accuracy of linear regression with finite samples. Even though SVM regression has been extensively used for regression applications (Schölkopf et al., 1999), its success is mainly due to remarkable ability of SVM
Noise variance estimation
The proposed method for selecting ε relies on the knowledge of the standard deviation of noise σ. The problem, of course, is that the noise variance is not known a priori, and it needs to be estimated from training data (i=1,…,n).
In practice, the noise variance can be readily estimated from the squared sum of residuals (fitting error) of the training data. Namely, the well-known approach of estimating noise variance (for linear models) is by fitting the data using low bias
Summary and discussion
This paper describes practical recommendations for setting meta-parameters for SVM regression. Namely the values of ε and C parameters are obtained directly from the training data and (estimated) noise level. Extensive empirical comparisons suggest that the proposed parameter selection yields good generalization performance of SVM estimates under different noise levels, types of noise, target functions and sample sizes. Hence, the proposed approach for SVM parameter selection can be immediately
Acknowledgements
The authors thank Dr V. Vapnik for many useful discussions. This work was supported, in part, by NSF grant ECS-0099906.
References (18)
- et al.
Model selection for support vector machines
(1999) - et al.
Comparison of model selection for regression
Neural Computation
(2003) - et al.
Learning from data: Concepts, theory, and methods
(1998) - et al.
Model complexity control for regression using VC generalization bounds
IEEE Transaction on Neural Networks
(1999) - et al.
Support vector regression machines
- et al.
The elements of statistical learning: Data mining, inference and prediction
(2001) Robust estimation of a location parameter
Annals of Mathematical Statistics
(1964)Linear dependency between ε and the input noise in ε-support vector regression
- et al.
Support vector machines for dynamic reconstruction of a chaotic system
Cited by (1810)
Design automation of sustainable self-compacting concrete containing fly ash via data driven performance prediction
2024, Journal of Building EngineeringHeart rate variability analysis method for exercise-induced fatigue monitoring
2024, Biomedical Signal Processing and ControlPerformance prediction and multi-objective optimization for the Atkinson cycle engine using eXtreme Gradient Boosting
2024, Thermal Science and Engineering ProgressForecasting cryptocurrencies volatility using statistical and machine learning methods: A comparative study
2024, Applied Soft Computing