The probabilistic constraints in the support vector machine

doi:10.1016/j.amc.2007.04.109

Applied Mathematics and Computation

Volume 194, Issue 2, 15 December 2007, Pages 467-479

https://doi.org/10.1016/j.amc.2007.04.109 Get rights and content

Abstract

In this paper, a new support vector machine classifier with probabilistic constrains is proposed which presence probability of samples in each class is determined based on a distribution function. Noise is caused incorrect calculation of support vectors thereupon margin can not be maximized. In the proposed method, constraints boundaries and constraints occurrence have probability density functions which it help for achieving maximum margin. Experimental results show superiority of the probabilistic constraints support vector machine (PC-SVM) relative to standard SVM.

Introduction

Such learning only aims at minimizing the classification error in the training phase, and it cannot guarantee the lowest error rate in the testing phase. In statistical learning theory, the support vector machine (SVM) has been developed for solving this bottleneck. Support vector machines (SVMs) as originally introduced by Vapnik within the area of statistical learning theory and structural risk minimization [1] and create a classifier with minimized VC dimension. It have proven to work successfully on wide range applications of nonlinear classification and function estimation such as optical character recognition [2], [3], text categorization [4], face detection in images [5], vehicle tracking in video sequence [6], nonlinear equalization in communication systems [7], and generating of fuzzy rule based system using SVM framework [8], [9].

Basically, the support vector machine is a linear machine with some very nice properties. It is not possible for such a set of training data to construct a separating hyperplane without encountering classification error. In this case a set of slack variable are used for samples that reduce confidence interval. In this case, it may be formulated to a dual problem form and so slack variable is not appeared in the dual problem and is converted to separable form. Main motivation of this paper rely on probabilistic constraints and obtained results include asymmetric margin depend on to probability density function of the data classes and importance of each samples in determination of hyperplane parameters.

In this sub-section some notes are expressed which researchers have considered to it in the field of support vector machine.

Usually, SVMs are trained using a batch model. Under this model, all training data is given a priori and training is performed in one batch. If more training data is later obtained, or we wish to test different constraint parameters, the SVM must be retrained from scratch. But if we are adding a small amount of data to a large training set, assuming that the problem is well posed, then it will likely have only a minimal effect on the decision surface. Resolving the problem from scratch seems computationally wasteful.

An alternative is to “warm-start” the solution process by using the old solution as a starting point to find a new solution. This approach is at the heart of active set optimization methods [10], [11] and, in fact, incremental learning is a natural extension of these methods. In papers [12], [13], [14], [15], incremental learning have been considered in field of SVM. In [23], the new set of points which are incremented, instead of being randomly generated, is generated according to a probability denotes the event that sample is a support vector. Selection of subset of data set from large data set is considered in [22] and solves a smaller optimization problem but notes as generalization ability is caused.

The introduction of kernel methods has made SVMs have nonlinear process ability. Presently, there are many Mercer kernels available such as Gaussian radial basis function kernel, sigmoid kernel, polynomial kernel, spline kernels, and others. These kernels must satisfy Mercer’s condition or they must be symmetric and positive semi definite. Here we will extend the range of usable kernels that are not required to satisfy the condition of the positive definite property. As we know, the introduction of kernel functions is based on the view of nonlinear mapping.

With regard that feedforward neural networks and radial basis function neural networks have a good nonlinear mapping ability and approximation performance. So in [16] input space is mapped into a hidden space by a set of hidden using artificial neural networks then introduce the structural risk in the hidden space to implement hidden space support vector machines.

In [21], authors try kernel is determined based on data properties. When all feature vectors are almost orthogonal, founded solution is nearly the center of gravity of the examples. Contrarily, when feature vectors are almost the same, the solution is approximated to that of the inhomogeneous and SVM have a linear kernel.

Standard SVM use a quadratic optimization problem for training. In [24], [25], least square SVM was proposed that use a set of linear equations in training.

As shown in previous researches [18], [19], SVM is very sensitive to outliers or noises since the penalty term of SVM treats every data point equally in the training process. This may result in the occurrence of overfitting problem if one or few data points have relatively very large values of slack variable. The fuzzy SVM (FSVM) to deal with the overfitting problem.

FSVM is an extension of SVM that takes into account the different significance of the training samples. For FSVM, each training sample is associated with a fuzzy membership value. The membership value reflects the fidelity of the data; in other words, how confident we are about the actual class information of the data. The higher its value, the more confident we are about its class label. The optimization problem of the FSVM is formulated in [17], [26] and have used in works such as [20], [27], [28]. In this method slack variable is scaled by the membership value. The fuzzy membership values are used to weigh the soft penalty term in the cost function of SVM. The weighted soft penalty term reflects the relative fidelity of the training samples during training. Important samples with larger membership values will have more impact in the FSVM training than those with smaller values.

We present probabilistic constraints in the SVM for the first time in this paper. Manifest features of the proposed method are,

•
Creating soft penalty term.
•
Reducing of effect of noisy samples in optimal hyperplane calculation.
•
Ability of adding confidence coefficient to training samples.

The rest of the paper is organized as follows: Section 2 introduces the PC-SVM and its geometrical interpretation. Experimental discussion is mentioned in Section 3. Conclusions are given in Section 4.

Section snippets

The proposed probabilistic constraints SVM (PC-SVM)

We first provide a brief describe on SVM and introduce a PC-SVM formulation.

Experimental results

Firstly we define overlap between classes for generating of test data. If (r₁₁, r₁₂) is amplitude boundaries of class 1 and (r₂₁, r₂₂) is amplitude boundaries of class 2 and r₂₁ (minimum value of class 2) is bigger than r₁₂ (maximum value of class1) then in this case do not exist any overlap between two classes but if r₂₁ = r₁₂ − η overlap for each class is according to following relation: ${Overlap}_{1} = 100 \frac{| r_{21} - r_{12} |}{r_{12} - r_{11}},$ ${Overlap}_{2} = 100 \frac{| r_{21} - r_{12} |}{r_{22} - r_{21}},$ where Overlap₁ is overlap between class 1 with class 2

Conclusions

Train noisy data are caused optimal hyperplane is not found in suitable position. The probabilistic constrain help in support vector classifier which margin is maximized based on reliability factor. The results showed that the proposed algorithm have higher capability relative standard SVM. In the next work we will presented automatic approach for finding parameters the PC-SVM algorithm include PDF of u_i in (19), (20) and boundaries of probability δ_i in (11) according to density of class and

References (28)

C. Liu et al.
Handwritten digit recognition: bench-marking of state-of-the-art techniques
Pattern Recognit.
(2003)
D. Tsujinishi et al.
Fuzzy least squares support vector machines for multi-class problems
Neural Networks Field
(2003)
E. Çomak et al.
A new medical decision making system:lLeast square support vector machine (LSSVM) with fuzzy weighting pre-processing
Expert Syst. Appl.
(2007)
C.-H. Yang et al.
Fuzzy support vector machines for adaptive Morse code recognition
Med. Eng. Phys.
(2006)
V. Vapnik
The Nature of Statistical Learning Theory
(1995)
Y. LeCun et al.
Learning algorithms for classification: A comparison on handwritten digit recognition
Neural Network
(1995)
T. Joachims, Text categorization with support vector machines: learning with many relevant features, in: C. Nedellec,...
E. Osuna et al.
Training support vector machines: an application to face detection
IEEE Conf. Comput. Vis. Pattern Recognit.
(1997)
S. Avidan
Support vector tracking
IEEE Trans. Pattern Anal. Mach. Intell.
(2004)
D.J. Sebald et al.
Support vector machine techniques for nonlinear equalization
IEEE Trans. Signal Process.
(2000)

J.-H. Chiang et al.

Support vector learning mechanism for fuzzy rule-based modeling: a new approach

IEEE Trans. Fuzzy Syst.

(2004)

Y. Chen et al.

Support vector learning for fuzzy rule-based classification systems

IEEE Trans. Fuzzy Syst.

(2003)

R. Fletcher

Practical Methods of Optimization

(1981)

T.F. Coleman et al.

A direct active set method for large sparse quadratic programs with simple bounds

Math. Program.

(1989)

Cited by (10)

Shell fitting space for classification
2011, Expert Systems with Applications
Citation Excerpt :
This problem motivates us for presentation of new space in which patterns can be classified by linear classifier, we name it shell fitting space (SFS) because we use the concept of shell fitting for the creation of the new space. Support vector machines (SVM) and its variants (Abe, 2005; Lin & Wang, 2002; Sadoghi Yazdi, Effati, & Saberi, 2007; Wang, 2005; Wu, Jie-Chi, & Lee, 2007) is a particular instance of KMs. But it has some weaknesses as follows.
In this paper, a shell fitting space (SFS) is presented to map non-linearly separable data to linearly separable ones. A linear or quadratic transformation maps data into a new space for better classification, if the transformation method is properly guessed. This new SFS space can be of high or low dimensionality, and the number of dimensions is generally low and it is equal to the number of classes. The SFS method is based on fitting a hyper-plane or shell to the learning data or enclosing them into a hyper-surface. In the proposed method, the hyper-planes, curves, or cortex become the axis of the new space. In the new space a linear support vector machine (SVM) multi-class classifier is applied to classify the learn data.
Classifying random variables based on support vector machine and a neural network scheme
2022, Journal of Experimental and Theoretical Artificial Intelligence
A New Method for Classifying Random Variables Based on Support Vector Machine
2019, Journal of Classification
Stochastic Support Vector Machine for Classifying and Regression of Random Variables
2018, Neural Processing Letters
Stochastic support vector regression with probabilistic constraints
2018, Applied Intelligence
Support vector regression with random output variable and probabilistic constraints
2017, Iranian Journal of Fuzzy Systems

View all citing articles on Scopus

View full text

The probabilistic constraints in the support vector machine

Abstract

Introduction

Section snippets

The proposed probabilistic constraints SVM (PC-SVM)

Experimental results

Conclusions

Pattern Recognit.

Neural Networks Field

Expert Syst. Appl.

Med. Eng. Phys.

The Nature of Statistical Learning Theory

Learning algorithms for classification: A comparison on handwritten digit recognition

Neural Network

Training support vector machines: an application to face detection

IEEE Conf. Comput. Vis. Pattern Recognit.

Support vector tracking

IEEE Trans. Pattern Anal. Mach. Intell.

Support vector machine techniques for nonlinear equalization

IEEE Trans. Signal Process.

Support vector learning mechanism for fuzzy rule-based modeling: a new approach

IEEE Trans. Fuzzy Syst.

Support vector learning for fuzzy rule-based classification systems

IEEE Trans. Fuzzy Syst.

Practical Methods of Optimization

A direct active set method for large sparse quadratic programs with simple bounds

Math. Program.