The probabilistic constraints in the support vector machine

https://doi.org/10.1016/j.amc.2007.04.109Get rights and content

Abstract

In this paper, a new support vector machine classifier with probabilistic constrains is proposed which presence probability of samples in each class is determined based on a distribution function. Noise is caused incorrect calculation of support vectors thereupon margin can not be maximized. In the proposed method, constraints boundaries and constraints occurrence have probability density functions which it help for achieving maximum margin. Experimental results show superiority of the probabilistic constraints support vector machine (PC-SVM) relative to standard SVM.

Introduction

Such learning only aims at minimizing the classification error in the training phase, and it cannot guarantee the lowest error rate in the testing phase. In statistical learning theory, the support vector machine (SVM) has been developed for solving this bottleneck. Support vector machines (SVMs) as originally introduced by Vapnik within the area of statistical learning theory and structural risk minimization [1] and create a classifier with minimized VC dimension. It have proven to work successfully on wide range applications of nonlinear classification and function estimation such as optical character recognition [2], [3], text categorization [4], face detection in images [5], vehicle tracking in video sequence [6], nonlinear equalization in communication systems [7], and generating of fuzzy rule based system using SVM framework [8], [9].

Basically, the support vector machine is a linear machine with some very nice properties. It is not possible for such a set of training data to construct a separating hyperplane without encountering classification error. In this case a set of slack variable are used for samples that reduce confidence interval. In this case, it may be formulated to a dual problem form and so slack variable is not appeared in the dual problem and is converted to separable form. Main motivation of this paper rely on probabilistic constraints and obtained results include asymmetric margin depend on to probability density function of the data classes and importance of each samples in determination of hyperplane parameters.

In this sub-section some notes are expressed which researchers have considered to it in the field of support vector machine.

Usually, SVMs are trained using a batch model. Under this model, all training data is given a priori and training is performed in one batch. If more training data is later obtained, or we wish to test different constraint parameters, the SVM must be retrained from scratch. But if we are adding a small amount of data to a large training set, assuming that the problem is well posed, then it will likely have only a minimal effect on the decision surface. Resolving the problem from scratch seems computationally wasteful.

An alternative is to “warm-start” the solution process by using the old solution as a starting point to find a new solution. This approach is at the heart of active set optimization methods [10], [11] and, in fact, incremental learning is a natural extension of these methods. In papers [12], [13], [14], [15], incremental learning have been considered in field of SVM. In [23], the new set of points which are incremented, instead of being randomly generated, is generated according to a probability denotes the event that sample is a support vector. Selection of subset of data set from large data set is considered in [22] and solves a smaller optimization problem but notes as generalization ability is caused.

The introduction of kernel methods has made SVMs have nonlinear process ability. Presently, there are many Mercer kernels available such as Gaussian radial basis function kernel, sigmoid kernel, polynomial kernel, spline kernels, and others. These kernels must satisfy Mercer’s condition or they must be symmetric and positive semi definite. Here we will extend the range of usable kernels that are not required to satisfy the condition of the positive definite property. As we know, the introduction of kernel functions is based on the view of nonlinear mapping.

With regard that feedforward neural networks and radial basis function neural networks have a good nonlinear mapping ability and approximation performance. So in [16] input space is mapped into a hidden space by a set of hidden using artificial neural networks then introduce the structural risk in the hidden space to implement hidden space support vector machines.

In [21], authors try kernel is determined based on data properties. When all feature vectors are almost orthogonal, founded solution is nearly the center of gravity of the examples. Contrarily, when feature vectors are almost the same, the solution is approximated to that of the inhomogeneous and SVM have a linear kernel.

Standard SVM use a quadratic optimization problem for training. In [24], [25], least square SVM was proposed that use a set of linear equations in training.

As shown in previous researches [18], [19], SVM is very sensitive to outliers or noises since the penalty term of SVM treats every data point equally in the training process. This may result in the occurrence of overfitting problem if one or few data points have relatively very large values of slack variable. The fuzzy SVM (FSVM) to deal with the overfitting problem.

FSVM is an extension of SVM that takes into account the different significance of the training samples. For FSVM, each training sample is associated with a fuzzy membership value. The membership value reflects the fidelity of the data; in other words, how confident we are about the actual class information of the data. The higher its value, the more confident we are about its class label. The optimization problem of the FSVM is formulated in [17], [26] and have used in works such as [20], [27], [28]. In this method slack variable is scaled by the membership value. The fuzzy membership values are used to weigh the soft penalty term in the cost function of SVM. The weighted soft penalty term reflects the relative fidelity of the training samples during training. Important samples with larger membership values will have more impact in the FSVM training than those with smaller values.

We present probabilistic constraints in the SVM for the first time in this paper. Manifest features of the proposed method are,

  • Creating soft penalty term.

  • Reducing of effect of noisy samples in optimal hyperplane calculation.

  • Ability of adding confidence coefficient to training samples.

The rest of the paper is organized as follows: Section 2 introduces the PC-SVM and its geometrical interpretation. Experimental discussion is mentioned in Section 3. Conclusions are given in Section 4.

Section snippets

The proposed probabilistic constraints SVM (PC-SVM)

We first provide a brief describe on SVM and introduce a PC-SVM formulation.

Experimental results

Firstly we define overlap between classes for generating of test data. If (r11, r12) is amplitude boundaries of class 1 and (r21, r22) is amplitude boundaries of class 2 and r21 (minimum value of class 2) is bigger than r12 (maximum value of class1) then in this case do not exist any overlap between two classes but if r21 = r12  η overlap for each class is according to following relation:Overlap1=100|r21-r12|r12-r11,Overlap2=100|r21-r12|r22-r21,where Overlap1 is overlap between class 1 with class 2

Conclusions

Train noisy data are caused optimal hyperplane is not found in suitable position. The probabilistic constrain help in support vector classifier which margin is maximized based on reliability factor. The results showed that the proposed algorithm have higher capability relative standard SVM. In the next work we will presented automatic approach for finding parameters the PC-SVM algorithm include PDF of ui in (19), (20) and boundaries of probability δi in (11) according to density of class and

References (28)

  • J.-H. Chiang et al.

    Support vector learning mechanism for fuzzy rule-based modeling: a new approach

    IEEE Trans. Fuzzy Syst.

    (2004)
  • Y. Chen et al.

    Support vector learning for fuzzy rule-based classification systems

    IEEE Trans. Fuzzy Syst.

    (2003)
  • R. Fletcher

    Practical Methods of Optimization

    (1981)
  • T.F. Coleman et al.

    A direct active set method for large sparse quadratic programs with simple bounds

    Math. Program.

    (1989)
  • Cited by (10)

    View all citing articles on Scopus
    View full text