Leave-one-out bounds for support vector ordinal regression machine

Yang, Zhixia; Tian, Yingjie; Deng, Naiyang

doi:10.1007/s00521-008-0217-z

Leave-one-out bounds for support vector ordinal regression machine

Original Article
Published: 28 November 2008

Volume 18, pages 731–748, (2009)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Zhixia Yang^1,2,
Yingjie Tian³ &
Naiyang Deng⁴

197 Accesses
5 Citations
Explore all metrics

Abstract

The success of support vector machine depends upon its parameters. The leave-one-out (LOO) method provides a quantitative criterion for selecting those parameters. However, one shortcoming of the LOO method is that it is highly time consuming. An effective approach is to approximate the LOO error by an upper bound. This paper is concerned with the support vector ordinal regression machine (SVORM). Two bounds of the LOO error for SVORM are presented. The first bound is based on the geometrical concept of a span. The second one is based on the concept of support vector. Preliminary numerical experiments show the validity of the bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum class variance support vector ordinal regression

Article 18 August 2016

Xiaoming Wang, Jinrong Hu & Zengxi Huang

Extended least squares support vector machines for ordinal regression

Article 24 June 2015

Na Zhang

Combining Absolute and Relative Information with Frequency Distributions for Ordinal Classification

References

Boser B, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Proceeding of the 5th annual ACM workshop on computational learing theory, pp 144–152
Vapnik V (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition. Springer, New York
MATH Google Scholar
Jaakkola TS, Haussler D (1999) Exploiting generative models in discriminative classifiers. In: Advances in neural information processing systems, vol 11. MIT Press, Cambridge, pp 487–493
Vapnik V, Chapelle O (2000) Bounds on error expectation for support vector machines. Neural Comput 12(9):2013–2036
Article Google Scholar
Gretton A, Herbrich R, Chapelle O (2003) Estimating the leave-one-out error for classification learning with SVMs. http://www.kyb.tuebingen.mpg.de/publications/pss/ps1854.ps, May 15
Joachims T (2000) Estimating the generalization performance of an SVM efficientily. In: Proceedings of the 17th international conference on machine learning. Morgan Kaufmann, San Franscisco, pp 431–438
Tian Y-J (2005) Support vector regession machine and its application. Ph.D. thesis, China Algricultural University
Chang M-W, Lin C-J (2005) Leave-one-out bounds for support vector regression model selection. Neural Comput 17(5):1188–1222
Article MATH MathSciNet Google Scholar
Shashua A, Levin A (2002) Taxonomy of large margin principle algorithms for ordinal regression problems. In: Advances in neural information processing systems, vol 15. MIT Press, Cambridge, pp 57–64
Herbrich R, Graepel R, Bollmann-Sdorra P, Obermayer K (1998) Learning a preference relation for information retrieval. In: Proceedings of the AAAI workshop text categorization and machine learning, Madison, USA
Tangian A, Gruber J (1995) Constructing quadratic and polynomial objective functions. In: Proceedings of the 3rd international conference on econometric decision models, Schwerte, Germany. Springer, Heidelberg, pp 166–194
Anderson J (1984) Regression and ordered categorical variables (with discussion). J R Stat Soc C Ser B 46:1–30
MATH Google Scholar
Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: Proceedings of the ninth international conference on arrifical neural networks, pp 97–102
Chu W, Keerthi SS (2005) New approaches to support vector ordinal regression. In: Proceedings of international conference on machine learning (ICML-05), pp 145–152
Arie BD, Yoav G (2005) Ordinal datasets. http://www.cs.waikato.ac.nz/ml/weka/
Weston J (1999) Leave-one-out support vector machines. In: Proceedings of the sixteenth international joint conference on artificial intelligence, pp 727–733

Download references

Acknowledgments

We would like to thank anonymous reviewers for their very concrete and helpful comments which improve this paper greatly.

Author information

Authors and Affiliations

College of Mathematics and Systems Science, Xinjiang University, 830046, Urumuqi, People’s Republic of China
Zhixia Yang
Academy of Mathematics and Systems Science, CAS, 100190, Beijing, People’s Republic of China
Zhixia Yang
Research Center on Fictitious Economy and Data Science, CAS, 100190, Beijing, People’s Republic of China
Yingjie Tian
College of Science, China Agricultural University, 100083, Beijing, People’s Republic of China
Naiyang Deng

Authors

Zhixia Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Tian
View author publications
You can also search for this author in PubMed Google Scholar
Naiyang Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naiyang Deng.

Additional information

This work is supported by the Key Project of National Natural Science Foundation of China (no. 10631070), the National Natural Science Foundation of China (no. 10801112, 70601033, 10601064) and the China Postdoctoral Science Foundation funded project (no. 20080430573).

Appendix

1.1 Proof of Lemma 1

Proof

We only prove that the set Λ ^q_p is non-empty since the corresponding result for the set Λ ^*q_p can be shown similarly.

Let us define Λ ^q+_p as the subset of Λ ^q_p with additional constraint λ ^j_i ≥ 0:

$$ \Uplambda_p^{q+}=\left\{\sum_{i\in M_p^q(\alpha,q)}\lambda_i^q\hbox{x}_i^q+\sum_{i\in M_p^q(\alpha^*,q+1)}\lambda_i^{q+1}\hbox{x}_i^{q+1}\in \Uplambda_p^q, \lambda_i^j\geq 0,j=q,q+1\right\}. $$

(42)

Next we shall show that the set Λ ^q_p is non-empty by proving the subset Λ ^q+_p ≠ ∅. In order to prove that $\Uplambda_p^{q+}\neq \varnothing$ , it is sufficient to prove that there exists a vector λ such that

$$ \lambda_i^q=\mu{\frac{C-\alpha_i^q}{\alpha_p^q}},\quad i\in M_p^q(\alpha,q), $$

(43)

$$ \lambda_i^{q+1}=\mu{\frac{\alpha_i^{q+1}}{\alpha_p^q}},\quad i\in M_p^q(\alpha^*,q+1), $$

(44)

$$ 0\leq \mu\leq 1, $$

(45)

$$ \sum_{i\in M_p^q (\alpha,q)}\lambda_i^q+\sum_{i\in M_p^q(\alpha^*,q+1)}\lambda_i^{q+1}=1, $$

(46)

because it is straightforward to show that when a vector λ satisfies (43)–(46), we have

$$ \sum\limits_{i\in M_p^q(\alpha,q)}\lambda_i^q\hbox{x}_i^q+\sum\limits_{i\in M_p^q(\alpha^*,q+1)}\lambda_i^{q+1}\hbox{x}_i^{q+1}\in \Uplambda_p^{q+}, $$

(47)

and therefore Λ ^q+_p ≠ ∅. Now we prove that the vector λ satisfying (43)–(46) exists. Taking into account equations (43) and (44), we rewrite constraint (46) as

$$ 1={\frac{\mu}{\alpha_p^q}}\left[\sum_{i\in M_p^q(\alpha,q)}(C-\alpha_i^q)+\sum_{i\in M_p^q(\alpha^*,q+1)}\alpha_i^{q+1}\right]. $$

(48)

Thus, it is sufficient to show that the value of μ given by equation (48) satisfies constraint (45). For this purpose, define

$$ \Updelta=\sum_{i\in M(\alpha,q)}(C-\alpha_i^q)+\sum_{i\in M(\alpha^*,q+1)}\alpha_i^{q+1}, $$

(49)

$$ =\sum_{i\in M(\alpha,q)}C-\sum_{i\in M(\alpha,q)}\alpha_i^q+\sum_{i\in M(\alpha^*,q+1)}\alpha_i^{q+1}. $$

(50)

Noting that

$$ \begin{aligned} 0&=\sum_{i\in M(\alpha^*,q+1)}\alpha_i^{q+1}+\sum_{i\in N(\alpha^*,q+1)}\alpha_i^{q+1}-\sum_{i\in M(\alpha,q)}\alpha_i^{q}-\sum_{i\in N(\alpha,q)}\alpha_i^{q}\\ &=\sum_{i\in M(\alpha^*,q+1)}\alpha_i^{q+1}-\sum_{i\in M(\alpha,q)}\alpha_i^{q}+\sum_{i\in N(\alpha^*,q+1)}C-\sum_{i\in N(\alpha,q)}C, \end{aligned} $$

(51)

and combining equation (50) and (51), we get

$$ \Updelta=\sum_{i\in M(\alpha,q)}C-\sum_{i\in N(\alpha^*,q+1)}C+\sum_{i\in N(\alpha,q)}C=C\tau, $$

(52)

where τ is a integer.

Since equation (49) gives Δ > 0, we have

$$ \Updelta\geq C. $$

(53)

Rewrite equation (48) as

$$ 1={\frac{\mu}{\alpha_p^q}}(\Updelta-(C-\alpha_p^q)), $$

or

$$ \mu={\frac{\alpha_p^q}{\Updelta-(C-\alpha_p^q)}}. $$

Taking into account inequality (53), we finally get 0 ≤ μ ≤ 1. Thus the set Λ ^q+_p is not empty. Namely, the set Λ ^q_p is not empty. $\square$

1.2 Proofs of Lemma 2 and Lemma 3

We only give the proof of Lemma 2 since the proof of Lemma 3 is similar.

Proof

Recalling (10), the vector w can be expressed as

$$ \hbox{w}=\sum_{j\neq q}\sum_{i=1}^{l^j}(\alpha_i^{*j}-\alpha_i^j)\hbox{x}_i^j+\sum_{i\neq q}(\alpha_i^{*q}-\alpha_i^q)+(\alpha^{*q}_p-\alpha_p^q)\hbox{x}_p^q. $$

(54)

Now we endeavor to replace support vector x ^q_p with a linear combination of the remaining margin support vectors about α in the qth class points and margin support vectors about α* in the q + 1th class points; this gives:

$$ \hbox{x}_p^q\approx \sum\limits_{i\in M_p^q(\alpha,q)}\lambda_i^q\hbox{x}_i^q+\sum\limits_{i\in M_p^{q}(\alpha^*,q+1)}\lambda_i^{q+1}\hbox{x}_i^{q+1}=\tilde{\hbox{x}}_p^q. $$

Taking this replacement yields an approximate expression for w

$$ \begin{aligned} \tilde{\hbox{w}}&=\sum_{j\neq q}\sum_{i=1}^{l^j}(\alpha_i^{*j}-\alpha_i^j)\hbox{x}_i^j+\sum_{i\neq q}(\alpha_i^{*q}-\alpha_i^q)+(\alpha^{*q}_p-\alpha_p^q)\left[\sum\limits_{i\in M_p^q(\alpha,q)}\lambda_i^q\hbox{x}_i^q+\sum\limits_{i\in M_p^{q}(\alpha^*,q+1)}\lambda_i^{q+1}\hbox{x}_i^{q+1}\right], \\ &=\sum_{j\neq q,q+1}\sum_{i=1}^{l^j}(\alpha_i^{*j}-\alpha_i^j)\hbox{x}_i^j+\sum_{i\notin M_p^q(\alpha,q),i\neq p}(\alpha_i^{*q}-\alpha_i^{q})\hbox{x}_i^{q}+\sum_{i\notin M_p^q(\alpha^*,q+1)}(\alpha_i^{*q+1}-\alpha_i^{q+1})\hbox{x}_i^{q+1}\\ &\quad+\sum_{i\in M _p^q(\alpha,q)}\left[\underbrace{(\alpha_i^{*q}-\alpha_i^q)+(\alpha_p^{*q}-\alpha_p^q)\lambda_i^q}_{\tilde{\alpha}_i^{*q}-\tilde{\alpha}_i^q}\right]\hbox{x}_i^q +\sum_{i\in M _p^q(\alpha^*,q+1)}\left[\underbrace{(\alpha_i^{*q+1}-\alpha_i^{q+1})+(\alpha_p^{*q}-\alpha_p^{q})\lambda_i^{q+1}}_{\tilde{\alpha}_i^{*q+1}-\tilde{\alpha}_i^{q+1}}\right]\hbox{x}_i^{q+1}, \end{aligned} $$

Setting

$$ \begin{aligned} \,&\tilde{\alpha}_i^{q}=\alpha_i^{q}+\lambda_i^{q}\alpha_p^{q},\ \tilde{\alpha}_i^{*q}=\alpha_i^{*q}+\lambda_i^{q}\alpha_p^{*q},\quad i\in M_p^q(\alpha,q), \\ \,&\tilde{\alpha}_i^{q+1}=\alpha_i^{q+1}-\lambda_i^{q+1}\alpha_p^{*q},\quad \tilde{\alpha}_i^{*q+1}=\alpha_i^{*q+1}-\lambda_i^{q+1}\alpha_p^{q},\quad i\in M_p^q(\alpha^*,q+1), \\ \,&\tilde{\alpha}_i^q=\alpha_i^q,\tilde{\alpha}_i^{*q}=\alpha_i^{*q},\quad i\notin M_p^q(\alpha,q),\\ \,&\tilde{\alpha}_i^{q+1}=\alpha_i^{q+1},\tilde{\alpha}_i^{*q+1}=\alpha_i^{*q+1},\quad i\notin M_p^q(\alpha^*,q+1), \\ \,&\tilde{\alpha}_i^j=\alpha_i^j,\tilde{\alpha}_i^{*j}=\alpha_i^{*j},\quad j=1,\ldots,q-1,q+2,\ldots,k,\quad i=1,\ldots,l^j, \end{aligned} $$

we have

$$ \begin{aligned} &0\leq \alpha_i^{q}+\lambda_i^q\alpha_p^{q}\leq C,\quad 0\leq \alpha_i^{*q}+\lambda_i^{q}\alpha_p^{*q}\leq C, \quad i\in M _p^q(\alpha,q), \\ &0\leq \alpha_i^{q+1}-\lambda_i^{q+1}\alpha_p^{*q}\leq C,\quad 0\leq\alpha_i^{*q+1}-\lambda_i^{q+1}\alpha_p^{q}\leq C ,\quad i\in M _p^q(\alpha^*,q+1), \\ &\sum_{i\in M _p^q(\alpha,q)}\tilde{\alpha}_i^q+\sum_{i\in N _p^q(\alpha,q)}\alpha_i^q=\sum_{i\in M _p^q(\alpha^*,q+1)}\tilde{\alpha}_i^{*q+1}+\sum_{i\in N _p^q(\alpha^*,q+1)}\alpha_i^{*q+1}. \end{aligned} $$

The above equalities imply that

$$ \sum_{i\in V_p^q(\alpha,q)}\alpha_i^{q}+\alpha_p^q\sum_{i\in M_p^q(\alpha,q)}\lambda_i^{q}=\sum_{i\in V _p^q(\alpha^*,q+1)}\alpha_i^{*q+1}-\alpha_p^{q}\sum_{i\in M _p^q(\alpha^*,q+1)}\lambda_i^{q+1}. $$

(55)

According to the constraint (8) of the dual problem (7)–(9) again, we get

$$ \sum\limits_{i\in M_p^q(\alpha,q)}\lambda_i^{q}+\sum\limits_{i\in M_p^q(\alpha^*,q+1)}\lambda_i^{q+1}=1, \lambda_p^q=-1. $$

(56)

So, by (55) and (56), $\tilde{\alpha}^{(*)}$ is a feasible solution of dual problem (7)–(9) for the training set ${T_p^q=T\setminus \{(\hbox{x}_p^q,y_p^q)\}.}$ $\square$

1.3 Proof of Lemma 4

Proof

1. For being left out the point x ^q_p , we consider its margin support vector about α. Consider the following optimization problem

$$ W(\alpha^{(*)}):=\max_{\alpha^{(*)}}\quad\sum_{j,i}(\alpha_i^j+\alpha_i^{*j})-\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\alpha_i^{*j}-\alpha_i^j)(\alpha_{i'}^{*j'}-\alpha_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}), $$

(57)

$$ \hbox{s.t.}\quad\sum_{i=1}^{l^j}\alpha_i^j=\sum_{i=1}^{l^{j+1}}\alpha_i^{*j+1},\quad j=1,2,\ldots,k-1, $$

(58)

$$0\leq \alpha_i^j,\alpha_i^{*j}\leq C,\quad j=1,\ldots,k,\quad i=1,\ldots,l^j, $$

(59)

$$\alpha_p^q=\alpha_p^{*q}=0. $$

(60)

Assuming that α ^(*)q_p is the optimal solution of the optimization problem (57)–(60) and α^(*) is the optimal solution of the dual problem (7)–(9), then the following inequalities hold:

$$ W(\alpha_p^{(*)q})\geq W(\alpha^{(*)}-\delta^{(*)}), $$

(61)

$$ W(\alpha^{(*)})\geq W(\alpha_p^{(*)q}+\gamma^{(*)}), $$

(62)

where δ^(*) satisfies the following conditions:

$$ \begin{aligned} \,&0\leq \alpha_i^j-\delta_i^j\leq C, \quad 0\leq \alpha_i^{*j}-\delta_i^{*j}\leq C, \quad j=1,\ldots,k,\quad i=1,\ldots,l^j, \\ \,&\sum_{i=1}^{l^j}\delta_i^j=\sum_{i=1}^{l^{j+1}}\delta_i^{*j+1},\quad j=1,2,\ldots,k-1, \\ &\delta_i^{*1}=0,\quad i=1,2,\ldots,l^1,\delta_i^k=0,\quad i=1,2,\ldots,l^k, \\ \,&\delta_p^q=\alpha_p^q,\quad \delta_p^{*q}=\alpha_p^{*q}, \end{aligned} $$

and γ^(*) satisfies the following conditions:

$$ 0\leq \alpha_{pi}^{qj}+\gamma_i^j\leq C,\quad 0\leq \alpha_{pi}^{*qj}+\gamma_i^{*j}\leq C,\quad j=1,\ldots,k,\quad i=1,\ldots,l^j, $$

(63)

$$ \sum_{i=1}^{l^q}\gamma_i^q=\sum_{i=1}^{l^{q+1}}\gamma_i^{*q+1}, $$

(64)

$$ \alpha_i^q=0\Rightarrow \gamma_i^q=0,\quad i\neq p,\quad \alpha_i^{*q+1}=0\Rightarrow \gamma_i^{*q+1}=0, $$

(65)

$$ \gamma_i^j=0,j\neq q,i=1,\ldots,l^j, \quad \gamma_i^{*j}=0,\quad j\neq q+1,\quad i=1,\ldots,l^j, $$

(66)

$$ \gamma_i^{*1}=0,\quad i=1,2,\ldots,l^1,\quad \gamma_i^k=0,\quad i=1,2,\ldots,l^k. $$

(67)

From inequality (61) we obtain

$$ W(\alpha^{(*)})-W(\alpha_p^{(*)q})\leq W(\alpha^{(*)})-W(\alpha^{(*)}-\delta^{(*)}). $$

(68)

Combining inequality (68) with (62), we get

$$ I_1=W(\alpha_p^{(*)q}+\gamma^{(*)})-W(\alpha_p^{(*)q})\leq W(\alpha^{(*)})-W(\alpha^{(*)}-\delta^{(*)})=I_2. $$

(69)

Next we calculate both the left hand side I ₁ and the right hand side I ₂ of inequality (69). First, for I ₁, we have

$$ \begin{aligned} I_1&=W(\alpha_p^{(*)q}+\gamma^{(*)})-W(\alpha_p^{(*)q})\\ &=\sum_{j,i}(\alpha_{pi}^{qj}+\gamma_i^j+\alpha_{pi}^{*qj}+\gamma_i^{*j})-\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\alpha_{pi}^{*qj}+\gamma_i^{*j}-\alpha_{pi}^{qj}-\gamma_i^j)(\alpha_{pi'}^{*qj'}+\gamma_{i'}^{*j'}\\ &-\alpha_{pi'}^{qj'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'})-\sum_{j,i}(\alpha_{pi}^{qj}+\alpha_{pi}^{*qj})+\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})(\alpha_{pi'}^{*qj'}-\alpha_{pi'}^{qj'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}),\\ &=\sum_{j,i}(\gamma_i^j+\gamma_i^{*j})-\sum_{j,i}\sum_{j',i'}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})(\gamma_{i'}^{*j'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'})-\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\gamma_i^{*j}-\gamma_i^j)(\gamma_{i'}^{*j'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}),\\ &=\sum_{j,i}(\gamma_i^j+\gamma_i^{*j})-\sum_{j,i}(\gamma_{i}^{*j}-\gamma_{i}^{j})(w_p^q\cdot \hbox{x}_i^j)-\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\gamma_i^{*j}-\gamma_i^j)(\gamma_{i'}^{*j'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}). \end{aligned} $$

According to equalities (63)–(67), we rewrite the above expression as

$$ \begin{aligned} I_1&=\sum_{i=1}^{l^{q+1}}\gamma_i^{*q+1}[1-(w_p^q\cdot \hbox{x}_i^{q+1})]+\sum_{i=1}^{l^{q}}\gamma_i^{q}[1+(w_p^q\cdot \hbox{x}_i^q)] -\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\gamma_i^{*j}-\gamma_i^j)(\gamma_{i'}^{*j'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}),\\ &=\sum_{i=1}^{l^{q+1}}\gamma_i^{*q+1}[1-(w_p^q\cdot \hbox{x}_i^{q+1})+b'_q]+\sum_{i=1}^{l^{q}}\gamma_i^{q}[1+(w_p^q\cdot \hbox{x}_i^q)-b'_q] \\ &\quad -\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\gamma_i^{*j}-\gamma_i^j)(\gamma_{i'}^{*j'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}). \end{aligned} $$

Taking into account that γ^(*) satisfies the conditions (63)–(67), the following two equalities hold:

$$ \begin{aligned} \,&\gamma_i^{*q+1}[1-(w_p^q\cdot \hbox{x}_i^{q+1})+b'_q]=0, \\ \,&\gamma_i^{q}[1+(w_p^q\cdot \hbox{x}_i^q)-b'_q]=0,\quad i\neq p. \end{aligned} $$

So we obtain

$$ I_1=\gamma_p^{q}[1+(\hbox{x}_p^q\cdot \hbox{x}_i^q)-b'_q]-\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\gamma_i^{*j}-\gamma_i^j)(\gamma_{i'}^{*j'}-\gamma_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}). $$

(70)

Now let us define vector γ^(*) as follows:

$$ \gamma_p^q=\gamma_s^{*q+1}=a, $$

(71)

$$ \gamma_i^j=0,\quad j= q,\quad i\neq p, \gamma_i^{*j}=0,\quad j= q+1,\quad i\neq s, $$

(72)

$$ \gamma_i^j=0,\quad j\neq q,\quad i=1,\ldots,l^j,\quad \gamma_i^{*j}=0,\quad j\neq q+1,\quad i=1,\ldots,l^j, $$

(73)

where a is some constant and α ^*q+1_s ∈ (0, C). Substituting (71)–(73) into (70), we get

$$ \begin{aligned} I_1&=a\left[(\hbox{x}_p^q\cdot \hbox{x}_p^q)-b'_q+1\right]-{\frac{a^2} 2}\|\hbox{x}_p^{q}-\hbox{x}_s^{q+1}\|^2\\ &\geq a\left[(\hbox{x}_p^q\cdot \hbox{x}_p^q)-b'_q+1\right]-{\frac{a^2} 2}D_{q,q+1}^2. \end{aligned} $$

(74)

where D _q,q+1 is the diameter of the minimum sphere containing the qth class points and the q + 1th class points in the training set T. Now choose the value a* by maximizing the expression (74)

$$ a^*=\frac{\left[(\hbox{x}_p^q\cdot \hbox{x}_p^q)-b'_q+1\right]}{D_{q,q+1}^2}. $$

Putting this expression back into (74), we get

$$ I_1\geq \frac{{\left[(w_p^q\cdot \hbox{x}_p^q)-b'_q+1\right]^2}}{2D_{q,q+1}^2}. $$

Since, according to our assumption, when the LOO procedure commits an error at the point x ^q_p , the following inequality holds

$$ (\hbox{x}_p^q\cdot \hbox{x}_p^q)-b'_q > 0, $$

we obtain

$$ I_1\geq \frac{1}{2D_{q,q+1}^2}. $$

But we need to fulfill the condition a ≤ C. Thus, if a* > C, we replace a by C in equality (74) and we get

$$ \begin{aligned} I_1&\geq C\left[(w_p^q\cdot \hbox{x}_p^q)-b'_q+1\right]-\frac{{C^2}}{2}D_{q,q+1}^2\\ &=CD_{q,q+1}^2\left(a^*-\frac{C}{2}\right)\\ &\geq CD_{q,q+1}^2\frac{{a^*}}{2}\\ &=C\left[(w_p^q\cdot \hbox{x}_p^q)-b'_q+1\right],\\ &\geq \frac{{C}}{2}. \end{aligned} $$

Finally, we have

$$ I_1\geq \frac{1}{ 2}\min\left(C,\frac{1}{D_{q,q+1}^2}\right). $$

(75)

Now we estimate the right hand side I ₂ of the inequality (69). According to Lemma 2, we choose

$$ \begin{aligned} \,&\delta_i^{q}=-\lambda_i^{q}\alpha_p^{q},\quad \delta_i^{*q}=-\lambda_i^{q}\alpha_p^{*q},\quad i\in M_p^q(\alpha,q), \\ \,&\delta_i^{q+1}=\lambda_i^{q+1}\alpha_p^{*q},\quad \delta_i^{*q+1}=\lambda_i^{q+1}\alpha_p^{q},\quad i\in M_p^q(\alpha^*,q+1), \\ \,&\delta_i^{q}=\delta_i^{*q}=0,\quad i\notin M_p^q(\alpha,q), \\ \,&\delta_i^{q+1}=\delta_i^{*q+1}=0,\quad i\notin M_p^q(\alpha^*,q+1), \\ \,&\delta_i^{j}=\delta_i^{*j}=0, \quad j\neq q,q+1,\quad i=1,\ldots,l^j, \\ \,& \delta_p^q=\alpha_p^q,\quad \delta_p^{*q}=\alpha_p^{*q}, \end{aligned} $$

where

$$ \sum\limits_{i\in M_p^q(\alpha,q)}\lambda_i^q\hbox{x}_i^q+\sum\limits_{i\in M_p^{q}(\alpha^*,q+1)}\lambda_i^{q+1}\hbox{x}_i^{q+1}\in \Uplambda_p^q. $$

Then the right hand side I ₂ of the inequality (69) is expressed as

$$ \begin{aligned} I_2&=W(\alpha^{(*)})-W(\alpha^{(*)}-\delta^{(*)}) \\ &=\sum_{j,i}(\alpha_i^j+\alpha_i^{*j})-\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\alpha_i^{*j}-\alpha_i^j)(\alpha_{i'}^{*j'}-\alpha_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'})-\sum_{j,i}(\alpha_i^j-\delta_i^j+\alpha_i^{*j}-\delta_i^{*j}) \\ &\quad+\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\alpha_i^{*j}-\delta_i^{*j}-\alpha_i^j+\delta_i^j)(\alpha_{i'}^{*j'}-\delta_{i'}^{*j'}-\alpha_{i'}^{j'}+\delta_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}) \\ &=\sum_{j,i}(\delta_i^j+\delta_i^{*j})-\sum_{j,i}\sum_{j',i'}(\alpha_i^{*j}-\alpha_i^j)(\delta_{i'}^{*j'}-\delta_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'})+\frac{1}{2}\sum_{j,i}\sum_{j',i'}(\delta_i^{*j}-\delta_i^j)(\delta_{i'}^{*j'}-\delta_{i'}^{j'})(\hbox{x}_i^j\cdot \hbox{x}_{i'}^{j'}) \\ &=-(\alpha_p^q+\alpha_p^{*q})\left[\sum_{i\in M(\alpha,q)\cup \{p\}}\lambda_i^{q}-\sum_{i\in M(\alpha^*,q+1)}\lambda_i^{q+1}\right]\\ &\quad+(\alpha_p^q+\alpha_p^{*q})\left[\sum_{i\in M(\alpha,q)\cup \{p\}}\lambda_i^{q}(w\cdot \hbox{x}_i^{q})+\sum_{i\in M(\alpha^*,q+1)}\lambda_i^{q+1}(w\cdot \hbox{x}_i^{q+1})\right]\\ &\quad+\frac{{(\alpha_p^{*q}-\alpha_p^q)^2}}{2}\left\|\hbox{x}_p^q-\sum\limits_{i\in M_p^q(\alpha,q)}\lambda_i^{q}\hbox{x}_i^{q-1}+\sum\limits_{i\in M_p^{q}(\alpha^*,q+1)}\lambda_i^{q+1}\hbox{x}_i^{q+1}\right\|^2.\\ \end{aligned} $$

Since the definition of Λ ^q_p implies that

$$ \sum_{i\in M_p^q(\alpha,q)}\lambda_i^{q}+ \sum_{i\in M_p^q(\alpha^*,q+1)}\lambda_i^{q+1}=1, \lambda_p^q=-1, $$

(76)

We have

$$ \begin{aligned} I_2&=-(\alpha_p^q+\alpha_p^{*q})\sum_{i\in M_p^q(\alpha,q)\cup \{p\}}\lambda_i^{q}[1-(w\cdot \hbox{x}_i^{q})]\\ &\quad+(\alpha_p^q+\alpha_p^{*q})\sum_{i\in M(\alpha^*,q+1)}\lambda_i^{q+1}[1+(w\cdot \hbox{x}_i^{q+1})]+\frac{{(\alpha_p^{*q}-\alpha_p^q)^2}}{2}S^2(p,q),\\ &=-(\alpha_p^q+\alpha_p^{*q})\sum_{i\in M_p^q(\alpha,q)\cup \{p\}}\lambda_i^{q}[1-(w\cdot \hbox{x}_i^{q})+b_q]\\ &\quad+(\alpha_p^q+\alpha_p^{*q})\sum_{i\in M(\alpha^*,q+1)}\lambda_i^{q+1}[1+(w\cdot \hbox{x}_i^{q+1})-b_q]+\frac{{(\alpha_p^{*q}-\alpha_p^q)^2}}{2}S^2(p,q),\\ &=\frac{{(\alpha_p^{*q}-\alpha_p^q)^2}}{2}S^2(p,q). \end{aligned} $$

(77)

Combining the equalities (69), (75) and (77), we obtain

$$ (\alpha_p^{*q}-\alpha_p^q)^2S^2(p,q)\geq\min \left(C,\frac{1}{D_{q,q+1}^2}\right), $$

(78)

where D _q,q+1 is the diameter of the minimum sphere containing both the qth class points and the q + 1th class points in the training set T.

2. Similarly to the process of deriving (78), for being left out the point x ^q_p , we can get the following inequality by considering its margin support vector about α*:

$$ (\alpha_p^{*q}-\alpha_p^q)^2S^{*2}(p,q)\geq\min \left(C,\frac{1}{D_{q-1,q}^2}\right), $$

(79)

where D _q−1,q is the diameter of minimum sphere containing the q−1th class points and the qth class points in the training set T. $\square$

1.4 Proof of Lemma 5

Proof

Suppose that α^(*) is the optimal solution of the problem (7)–(9). It is sufficient to study the following three cases, respectively:

1. The case α ^q_p = α ^*q_p = 0: Being left out the points (x ^q_p , y ^q_p ) is not a support vector. Then the object function value of the problem (7)–(9) is equal to that of the problem (34)–(36), namely, W(α^(*)) = W ^q_p (α ^(*)q_p ). So the decision function does not change after left out the point (x ^q_p , y ^q_p ). So the point (x ^q_p , y ^q_p ) is not counted as a leave one out error.

2. The case α ^q_p > 0: Being left out the point (x ^q_p , y ^q_p ) is a support vector. Starting from the solution α ^(*)q_p of the problem (34)–(36), a feasible points β^(*) of the problem (7)–(9) can be constructed by

$$ \begin{aligned} \beta_i^j&=\left\{\begin{array}{ll} \alpha_{pi}^{qj},& j=1,\ldots,q-1,q+1,\ldots,k,\quad i=1,\ldots,l^j,\\ \alpha_{pi}^{qq},& \alpha_{pi}^{qq}=0,\quad \alpha_{pi}^{qq}=C,\\ \alpha_{pi}^{qq}-\nu_i^q,& i\in M(\alpha_p^{(*)q},q),\\ \alpha_p^q, & j=q,\quad i=p,\\ \end{array}\right.\\ \beta_i^{*j}&=\left\{\begin{array}{ll} \alpha_{pi}^{*qj},& j=1,\ldots,q-1,q+1,\ldots,k,\quad i=1,\ldots,l^j,\\ \alpha_{pi}^{*qq},& \alpha_{pi}^{*qq}=0,\quad \alpha_{pi}^{*qq}=C,\\ \alpha_{pi}^{*qq}-\nu_i^{*q},& i\in M(\alpha_p^{(*)q},q),\\ \alpha_p^{*q}, & j=q,\quad i=p,\\ \end{array}\right. \end{aligned} $$

where M(α ^(*)q_p , q) is the set of margin support vector about α^(*) for the problem (34)–(36), ν ^q_i and ν ^*q_i , respectively, satisfy conditions

$$ \sum_{i\in M(\alpha_p^{(*)q},q)}\nu_i^q=\alpha_p^q, $$

(80)

and

$$ \sum_{i\in M(\alpha_p^{(*)q},q)}\nu_i^{*q}=\alpha_p^{*q}. $$

(81)

It is easy to see that

$$ \begin{aligned} \,&0\leq \beta_i^j,\quad \beta_i^{*j}\leq C,\quad j=1,\ldots,k,\quad i=1,\ldots,l^j, \\ \,&\sum_{i=1}^{l^j}\beta_i^j=\sum_{i=1}^{l^{j+1}}\beta_i^{*j+1},\quad j=1,\ldots,k-1. \end{aligned} $$

Thus β^(*) is a feasible solution of the problem (7)–(9). After a series of transformations, W(β^(*)) can be written as

$$ \begin{aligned} W(\beta^{(*)})=&W_p^q(\alpha_p^{(*)q})+(\alpha_p^{*q}+\alpha_p^q)-\frac{1}{ 2}(\alpha_p^{*q}-\alpha_p^q)^2K(x_p^q,x_p^q)\\ &-(\alpha_p^{*q}-\alpha_p^q)\sum_{(j,i)\in I\setminus\{(q,p)\}}(\alpha_{pi}^{*qq}-\alpha_{pi}^{qq})K(x_p^q,x_i^j)\\ &+\sum_{i\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)\left[1+\sum_{(j',i')\in I\setminus\{(q,p)\}}(\alpha_{pi'}^{*qj'}-\alpha_{pi'}^{qj'})K(x_i^q,x_{i'}^{j'})\right]\\ &+(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)K(x_p^q,x_i^q)\\ &-\frac{1}{ 2}\sum_{i\in M(\alpha_p^{(*)q},q)}\sum_{i'\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)(\nu_{i'}^{*q}-\nu_{i'}^q)K(x_i^q,x_{i'}^q). \end{aligned} $$

(82)

According to the assumption in the lemma, there exists at least a α ^j_i ∈ (0, C), j = 1,…, k in each class points. Therefore,

$$ 1+\sum_{(j',i')\in I\setminus\{(q,p)\}}(\alpha_{pi'}^{*qj'}-\alpha_{pi'}^{qj'})K(x_i^q,x_{i'}^{j'})=b'_q. $$

(83)

From the equalities (80) and (81), we get

$$ \sum_{i\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)=(\alpha_p^{*q}-\alpha_p^q). $$

(84)

So we rewrite the equality (82) as

$$ \begin{aligned} \,&(\alpha_p^{*q}-\alpha_p^q) {\left [\sum_{(j,i)\in I\setminus\{(q,p)\}}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})K(x_p^q,x_i^j)-b'_q\right]} \\ \,&\quad=-W(\beta^{(*)})+W_p^q(\alpha_p^{(*)q})+(\alpha_p^{*q}+\alpha_p^q)-\frac{1}{ 2}(\alpha_p^{*q}-\alpha_p^q)^2K(x_p^q,x_p^q)\\ \,&\quad\quad-\frac{1}{ 2}\sum_{i\in M(\alpha_p^{(*)q},q)}\sum_{i'\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)(\nu_{i'}^{*q}-\nu_{i'}^q)K(x_i^q,x_{i'}^q),\\ \,&\quad\quad+(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)K(x_p^q,x_i^q). \end{aligned} $$

(85)

Similarly, let us construct a feasible solution γ^(*) of the problem (34)–(36) based on the solution α^(*) of the problem (7)–(9) by

$$ \gamma_i^j= \left\{\begin{array}{ll} \alpha_i^j,& j=1,\ldots,q-1,q+1,\ldots,k, i=1,\ldots,l^j,\\ \alpha_i^q,& \alpha_i^q=0,\quad \alpha_i^q=C,\\ \alpha_i^q+\mu_i^q,& i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}, \end{array}\right. $$

and

$$ \gamma_i^{*j}= \left\{\begin{array}{ll} \alpha_i^{*j},& j=1,\ldots,q-1,q+1,\ldots,k,\quad i=1,\ldots,l^j,\\ \alpha_i^{*q},& \alpha_i^q=0 \hbox{ or } \alpha_i^q=C,\\ \alpha_i^{*q}+\mu_i^{*q},& i\in M(\alpha^{(*)},q)\setminus \{(q,p)\},\\ \end{array}\right. $$

where μ ^q_i ≥ 0 and μ ^*q_i ≥ 0, respectively, satisfy conditions

$$ \sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}\mu_i^q=\alpha_p^q; $$

and

$$ \sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}\mu_i^{*q}=\alpha_p^{*q}. $$

It is easy to see that

$$ \begin{aligned} &0\leq \gamma_i^j,\quad \gamma_i^{*j}\leq C, \quad (j,i)\in I\setminus \{(q,p)\},\\ &\sum_{i=1}^{l^j}\gamma_i^j=\sum_{i=1}^{l^{j+1}}\gamma_i^{*j+1},\quad (j,i)\in I\setminus \{(q,p)\}. \end{aligned} $$

So γ^(*) is a feasible solution to the problem (34)–(36). After a series of transformations, W ^q_p (γ^(*)) can be written as

$$ \begin{aligned} W_p^q(\gamma^{(*)})=&W(\alpha^{(*)})+(\alpha_p^{*q}-\alpha_p^q)+\frac{1}{ 2}(\alpha_p^{*q}-\alpha_p^q)^2K(x_p^q,x_p^q)\\ &+(\alpha_p^{*q}-\alpha_p^q)\sum_{(j,i)\in I\setminus \{(q,p)\}}(\alpha_i^{*j}-\alpha_i^j)K(x_p^q,x_i^j)\\ &-\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)\left[1+\sum_{j',i'}(\alpha_{i'}^{*j'}-\alpha_{i'}^{j'})K(x_i^q,x_{i'}^{j'})\right]\\ &-\frac{1}{ 2}\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}\sum_{i'\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)(\mu_{i'}^{*q}-\mu_{i'}^q)K(x_i^q,x_i^q)\\ &+(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)K(x_p^q,x_i^q). \end{aligned} $$

(86)

According to the assumption that there exists a α ^j_i ∈ (0, C), j = 1,…, k in each class points at least, then we have

$$ 1+\sum_{j',i'}(\alpha_{i'}^{*j'}-\alpha_{i'}^{j'})K(x_i^q,x_{i'}^{j'})=b_q. $$

So we rewrite the equality (86) as

$$ \begin{aligned} -W(\alpha^{(*)})=&-W_p^q(\gamma^{(*)})+(\alpha_p^{*q}-\alpha_p^q)+\frac{1}{ 2}(\alpha_p^{*q}-\alpha_p^q)^2K(x_p^q,x_p^q)\\ &+(\alpha_p^{*q}-\alpha_p^q)\left[\sum_{(j,i)\in I\setminus \{(q,p)\}}(\alpha_i^{*j}-\alpha_i^j)K(x_p^q,x_i^j)-b_q\right]\\ &-\frac{1}{ 2}\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}\sum_{i'\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)(\mu_{i'}^{*q}-\mu_{i'}^q)K(x_i^q,x_i^q)\\ &+(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)K(x_p^q,x_i^q). \end{aligned} $$

(87)

Due to W(α^(*)) ≥ W(β^(*)), W ^q_p (α ^(*)q_p ) ≥ W ^q_p (γ^(*)) and the equalities (85) and (87), we get

$$ \begin{aligned} \,&(\alpha_p^{*q}-\alpha_p^q) {\left [\sum_{(j,i)\in I\setminus\{(q,p)\}}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})K(x_p^q,x_i^j)-b'_q \right]} \geq (\alpha_p^{*q}-\alpha_p^q) {\left [\sum_{(j,i)\in I\setminus \{(q,p)\}}(\alpha_i^{*j}-\alpha_i^j)K(x_p^q,x_i^j)-b_q \right]} \\ \,&\quad-\frac{1}{ 2}\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}\sum_{i'\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)(\mu_{i'}^{*q}-\mu_{i'}^q)K(x_i^q,x_i^q)\\ \,&\quad+(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)K(x_p^q,x_i^q) \\ \,&\quad-\frac{1}{ 2}\sum_{i\in M(\alpha_p^{(*)q},q)}\sum_{i'\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)(\nu_{i'}^{*q}-\nu_{i'}^q)K(x_i^q,x_{i'}^q)+(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)K(x_p^q,x_i^q). \end{aligned} $$

According to the definitions of ν ^q_i , ν ^*q_i , i ∈ M(α ^(*)q_p , q) and ${\mu_i^q,\mu_i^{*q},i\in M(\alpha^{(*)},q)\setminus \{(q,p)\},}$ we know

$$ \begin{aligned} &(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)K(x_p^q,x_i^q)\geq 0,\\ &(\alpha_p^{*q}-\alpha_p^q)\sum_{i\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)K(x_p^q,x_i^q)\geq 0. \end{aligned} $$

Furthermore since

$$ \begin{aligned} &\frac{1}{ 2}\sum_{i\in M(\alpha_p^{(*)q},q)}\sum_{i'\in M(\alpha_p^{(*)q},q)}(\nu_i^{*q}-\nu_i^q)(\nu_{i'}^{*q}-\nu_{i'}^q)K(x_i^q,x_{i'}^q)\leq \frac{1}{ 2}(\alpha_p^{*q}-\alpha_p^q)^2R^2,\\ &\frac{1}{ 2}\sum_{i\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}\sum_{i'\in M(\alpha^{(*)},q)\setminus \{(q,p)\}}(\mu_i^{*q}-\mu_i^q)(\mu_{i'}^{*q}-\mu_{i'}^q)K(x_i^q,x_i^q)\leq \frac{1}{ 2}(\alpha_p^{*q}-\alpha_p^q)^2R^2, \end{aligned} $$

where R ² = max {K(x ^j_i , x ^j_i )|j = 1,…, k, i = 1,…, l ^j}, we arrive that

$$ \begin{aligned} \,&(\alpha_p^{*q}-\alpha_p^q)\left[\sum_{(j,i)\in I\setminus\{(q,p)\}}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})K(x_p^q,x_i^j)-b'_q\right] \\ \,&\quad\geq (\alpha_p^{*q}-\alpha_p^q)\left[\sum_{(j,i)\in I\setminus \{(q,p)\}}(\alpha_i^{*j}-\alpha_i^j)K(x_p^q,x_i^j)-b_q\right]-(\alpha_p^{*q}-\alpha_p^q)^2R^2, \end{aligned} $$

namely,

$$ \begin{aligned} \,&\left[\sum_{(j,i)\in I\setminus\{(q,p)\}}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})K(x_p^q,x_i^j)-b'_q\right] \\ \,&\quad\leq \left[\sum_{j,i}(\alpha_i^{*j}-\alpha_i^j)K(x_p^q,x_i^j)-b_q\right]-(\alpha_p^{*q}-\alpha_p^q)(K(x_p^q,x_p^q)+R^2). \end{aligned} $$

3. The case α ^*q_p > 0: Following the argumentation in the case (2), an LOO error can occur only if

$$ \begin{aligned} \,&-\left[\sum_{(j,i)\in I\setminus \{(q,p)\}}(\alpha_{pi}^{*qj}-\alpha_{pi}^{qj})K(x_p^q,x_i^j)-b'_{q-1}\right]\\ \,&\quad\leq-\left[\sum_{j,i}(\alpha_i^{*j}-\alpha_i^j)K(x_p^q,x_i^j)- b_{q-1}\right]+(\alpha_p^{*q}-\alpha_p^q)(K(x_p^q,x_p^q)+R^2). \end{aligned} $$

$\square$

1.5 Computation of the S-span

In this subsection, we address the problem of computing S ²(q, p) and S ^*2(q, p) appeared in Definitions 3 and 4. They can be obtained by solving a quadratic programming, respectively. We only discuss the computation of S ²(q, p) since the computation of S ^*2(q, p) can be shown similarly.

The following lemma gives an equivalent description to Definition 3.

Lemma 6

Introducing the following notation:

$$ \{\lambda_i^q, i\in M_p^q(\alpha,q)\}=\{\lambda_1,\ldots,\lambda_r\}, $$

(88)

$$ \{\lambda_i^{q+1}, i\in M_p^q(\alpha^*,q+1)\}=\{\lambda_{r+1},\ldots,\lambda_s\}, $$

(89)

$$ \{x_i^q,i\in M_p^q(\alpha,q)\}=\{x_1,\ldots,x_r\}, $$

(90)

$$ \{x_i^{q+1},i\in M_p^q(\alpha^*,q+1)\}=\{x_{r+1},\ldots,x_s\}, $$

(91)

$$ \{\alpha_i^q,\alpha_i^{*q},i\in M_p^q(\alpha,q)\}=\{\alpha_1,\ldots,\alpha_r,\alpha^*_1,\ldots,\alpha^*_r\}, $$

(92)

$$ \{\alpha_i^{q+1},\alpha_i^{*q+1},i\in M_p^q(\alpha^*,q+1)\}=\{\alpha_{r+1},\ldots,\alpha_s,\alpha^*_{r+1},\ldots,\alpha^*_s\}, $$

(93)

Definition 3 is equivalent to the following description: For any margin support vector x ^q_p about α, its S-span is

$$ S^2(q,p):=\min\{\|x_p^q-\tilde{x}_p^q\|^2|\tilde{x}_p^q\in\Uplambda_p^q\}, $$

(94)

where $ \Uplambda_p^q:= \left\{\sum\limits_{i=1}^s\lambda_ix_i\right\},$ subject to constraints

$$ \begin{aligned} \,&0\leq \alpha_i+\lambda_i\alpha_p^{q}\leq C,\quad 0\leq \alpha_i^*+\lambda_i\alpha_p^{*q}\leq C, \quad i=1,\ldots,r,\\ \,&0\leq \alpha_i-\lambda_i\alpha_p^{*q}\leq C,\quad 0\leq\alpha_i^*-\lambda_i\alpha_p^{q}\leq C ,\quad i=r+1,\ldots,s, \\ \,& \sum_{i=1}^s\lambda_i=1,\quad\lambda_p^q=-1. \end{aligned} $$

Proof

It is sufficient to substitute (88)–(93) into Definition 3. $\square$

The following theorem shows that, in Definition 3, S ²(q, p) can be obtained by solving a quadratic programming.

Theorem 3

S ²(q, p) defined in Definition 3 can be obtained by solving the following quadratic programming:

$$ \min_\lambda \quad \sum_{i=1}^s\sum_{j=1}^s\lambda_i\lambda_j(x_i\cdot x_j)-2\sum_{i=1}^s\lambda_i(x_p^q\cdot x_i), $$

(95)

$$ \hbox{s.t.}\quad -{\frac{\alpha_i}{\alpha_p^q}}\leq \lambda_i\leq {\frac{C-\alpha^*_i}{\alpha_p^{*q}}},\quad -{\frac{\alpha^*_i} {\alpha_p^{*q}}}\leq \lambda_i\leq {\frac{C-\alpha^*_i} {\alpha_p^{*q}}},\quad i=1,\ldots,r, $$

(96)

$$ {\frac{\alpha_i-C}{\alpha_p^{*q}}}\leq \lambda_i\leq {\frac{\alpha_i}{\alpha_p^{*q}}},{\frac{\alpha^*_i-C} {\alpha_p^{q}}}\leq \lambda_i\leq {\frac{\alpha^*_i} {\alpha_p^{q}}},\quad i=r+1,\ldots,s, $$

(97)

$$ \sum_{i=1}^s\lambda_i=1. $$

(98)

Proof

We want to minimize the quantity in (94) with respect to {λ_i}

$$ \begin{aligned} \|x_p^q-\tilde{x}_p^q\|^2&=((x_p^q-\tilde{x}_p^q)\cdot (x_p^q-\tilde{x}_p^q)) \\ &=(x_p^q\cdot x_p^q)+(\tilde{x}_p^q\cdot \tilde{x}_p^q)-2(x_p^q\cdot\tilde{x}_p^q) \\ &=(x_p^q\cdot x_p^q)+\sum_{i=1}^s\sum_{j=1}^s\lambda_i\lambda_j(x_i\cdot x_j)-2\left(x_p^q\cdot \left(\sum_{i=1}^s\lambda_ix_i\right)\right), \end{aligned} $$

subject to constraints

$$ \begin{aligned} \,&-{\frac{\alpha_i}{\alpha_p^q}}\leq \lambda_i\leq {\frac{C-\alpha^*_i} {\alpha_p^{*q}}},\quad -{\frac{\alpha^*_i}{\alpha_p^{*q}}}\leq \lambda_i\leq {\frac{C-\alpha^*_i}{\alpha_p^{*q}}},\quad i=1,\ldots,r, \\ \,&{\frac{\alpha_i-C}{\alpha_p^{*q}}}\leq \lambda_i\leq {\frac{\alpha_i} {\alpha_p^{*q}}},\quad {\frac{\alpha^*_i-C}{\alpha_p^{q}}}\leq \lambda_i\leq {\frac{\alpha^*_i}{\alpha_p^{q}}},\quad i=r+1,\ldots,s, \\ \,&\sum_{i=1}^s\lambda_i=1, \end{aligned} $$

namely, solving the quadratic programming (95)–(98). $\square$

In order to compute S ²(p, q) and S ^*2(p, q) in Theorem 1, we only need to replace the inner product (x · x′) in (95)–(98) by the kernel function K(x, x′).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, Z., Tian, Y. & Deng, N. Leave-one-out bounds for support vector ordinal regression machine. Neural Comput & Applic 18, 731–748 (2009). https://doi.org/10.1007/s00521-008-0217-z

Download citation

Received: 13 August 2007
Accepted: 04 November 2008
Published: 28 November 2008
Issue Date: October 2009
DOI: https://doi.org/10.1007/s00521-008-0217-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leave-one-out bounds for support vector ordinal regression machine

Abstract

Access this article

Similar content being viewed by others

Minimum class variance support vector ordinal regression

Extended least squares support vector machines for ordinal regression

Combining Absolute and Relative Information with Frequency Distributions for Ordinal Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proofs of Lemma 2 and Lemma 3

Proof

1.3 Proof of Lemma 4

Proof

1.4 Proof of Lemma 5

Proof

1.5 Computation of the S-span

Lemma 6

Proof

Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leave-one-out bounds for support vector ordinal regression machine

Abstract

Access this article

Similar content being viewed by others

Minimum class variance support vector ordinal regression

Extended least squares support vector machines for ordinal regression

Combining Absolute and Relative Information with Frequency Distributions for Ordinal Classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix

Appendix

1.1 Proof of Lemma 1

Proof

1.2 Proofs of Lemma 2 and Lemma 3

Proof

1.3 Proof of Lemma 4

Proof

1.4 Proof of Lemma 5

Proof

1.5 Computation of the S-span

Lemma 6

Proof

Theorem 3

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation