Elsevier

Neural Networks

Volume 12, Issue 6, July 1999, Pages 783-789
Neural Networks

Improving support vector machine classifiers by modifying kernel functions

https://doi.org/10.1016/S0893-6080(99)00032-5Get rights and content

Abstract

We propose a method of modifying a kernel function to improve the performance of a support vector machine classifier. This is based on the structure of the Riemannian geometry induced by the kernel function. The idea is to enlarge the spatial resolution around the separating boundary surface, by a conformal mapping, such that the separability between classes is increased. Examples are given specifically for modifying Gaussian Radial Basis Function kernels. Simulation results for both artificial and real data show remarkable improvement of generalization errors, supporting our idea.

Introduction

Support Vector Machine (SVM) is a new promising pattern classification technique proposed recently by Vapnik and co-workers (Boser et al. 1992, Cortes and Vapnik, 1995, and Vapnik, 1995). Unlike traditional methods which minimize the empirical training error, SVM aims at minimizing an upper bound of the generalization error through maximizing the margin between the separating hyperplane and the data. This can be regarded as an approximate implementation of the Structure Risk Minimization principle. What makes SVM attractive is the property of condensing information in the training data and providing a sparse representation by using a very small number of data points (SVs) (Girosi, 1998).

SVM is a linear classifier in the parameter space, but it is easily extended to a nonlinear classifier of the φ-machine type (Aizerman, Braverman & Rozonoer, 1964) by mapping the space S={x} of the input data into a high-dimensional (possibly infinite-dimensional) feature space F={φ(x)}. By choosing an adequate mapping φ, the data points become linearly separable or mostly linearly separable in the high-dimensional space, so that one can easily apply the structure risk minimization. We need not compute the mapped patterns φ(x) explicitly, and instead we only need the dot products between mapped patterns. They are directly available from the kernel function which generates φ(x). By choosing different kinds of kernels, SVM can realize Radial Basis Function (RBF), Polynomial and Multi-layer Perceptron classifiers. Compared with the traditional way of implementing them, SVM has an extra advantage of automatic model selection, in the sense that both the optimal number and locations of the basis functions are automatically obtained during training (Schölkopf et al., 1996).

The performance of SVM largely depends on the kernel. Smola, Schölkopf and Müller (1998) elucidated the relation between the SVM kernel method and the standard regularization theory (Girosi, Jones & Poggio, 1995). However, there are no theories concerning how to choose good kernel functions in a data-dependent way. The present paper is a first step to this important problem. We propose an information-geometric method of modifying a kernel to improve the performance. This is based on the structure of the Riemannian geometry induced in the input space by the kernel. A nonlinear function φ embeds the input space S={x} in a high-dimensional Euclidean or Hilbert feature space F={φ} as a curved submanifold. This embedding induces a Riemannian metric in the input space, which shows how a small volume element in the input space is enlarged or reduced in the feature space. The idea is as follows: in order to increase the margin or separability in the feature space without changing the volume of the entire space, it is efficient to enlarge volume elements locally in neighborhoods of support vectors which are located closely to the boundary surface. This makes it possible to enlarge the spatial resolution around the boundary so that the separability of classes is increased. To implement this idea, we use a conformal mapping of the input Riemannian space. This will be realized approximately by a conformal transformation of a kernel.

The practical training process consists of two steps: In the first step a primary kernel is used to obtain support vectors. The kernel is then modified conformally in a data dependent way by using the information of the support vectors. In the second step the modified kernel is used to obtain the final classifier. Examples are given specifically for modifying Gaussian RBF kernels. Simulation results for both artificial and real data support our method.

Section snippets

Geometry of the SVM kernel

Consider a pattern classifier, which uses a hyperplane to separate two classes of patterns based on given examples {xi, yi} for i=1,…,l, where xi is a vector in the input space S=Rn and yi denotes the class index taking a value +1 or −1. A nonlinear SVM maps the input data x into a high dimensional feature space F=RN (N may be infinite) by using a nonlinear mapping φ, z=φ(x). It then searches for a linear discriminant functionf(x)=w·φ(x)+bin the feature space. Patterns are classified by the

Simulation experiments

To evaluate the performance of our method, we did simulations on two classification problems, one artificial and one real. The primary kernel function is fixed to be a Gaussian RBF. The SVM solver we used is a gradient descent method. This method is an application of the Adatron method (Anlauf & Biehl, 1989) to the kernel SVM named the Kernel-Adatron algorithm (Friess, Cristianini & Campbell, 1998). We used a different version, developed independently, to look for an approximate solution by

Conclusion

In this paper we presented a new method of modifying a kernel to improve the performance of a SVM classifier. It is based on information-geometric consideration of the structure of the Riemannian geometry induced by the kernel. The idea is to enlarge the spatial resolution around the boundary by a conformal transformation so that the separability of classes is increased. This geometrical picture is confirmed by simulations. Examples are given specifically for modifying a Gaussian RBF kernel.

References (14)

  • A.J. Smola et al.

    The connection between regularization operators and support vector kernels

    Neural Networks

    (1998)
  • M.A. Aizerman et al.

    Theoretical foundations of the potential function method in pattern recognition learning

    Automation and Remote Control

    (1964)
  • J.K. Anlauf et al.

    The Adatron: an adaptive perceptron algorithm

    Europhysics Letters

    (1989)
  • B. Boser et al.

    A training algorithm for optimal margin classifiers

  • Burges, C. J. C. (1999). Geometry and invariance in kernel based methods. In B. Schölkopf et al. (Eds.), Advances in...
  • C. Cortes et al.

    Support vector networks

    Machine Learning

    (1995)
  • T.T. Friess et al.

    The Kernel-Adatron algorithm: a fast and simple learning procedure for support vector machines

    Proceedings of the 15th International Conference on Machine Learning, Madison

    (1998)
There are more references available in the full text version of this article.

Cited by (856)

View all citing articles on Scopus
View full text