Convergence rate of the semi-supervised greedy algorithm
Introduction
The semi-supervised learning, i.e., learning from a set of the labeled and unlabeled data, has attracted many researchers recently due to its main challenge in how to improve its prediction performance using a few labeled data with a large set of unlabeled data. In literature, algorithms of the semi-supervised learning have been proposed in different perspectives. Examples include the graph-based learning (Belkin and Niyogi, 2004, Belkin et al., 2006, Chen et al., 2009, Johnson and Zhang, 2007, Johnson and Zhang, 2008), co-training (Blum and Mitchell, 1998, Sindhwani et al., 2005, Sindhwani and Rosenberg, 2008) and many others. A review study of the semi-supervised learning is discussed in Chapelle, Schölkopf, and Zien (2006) and Zhu (2005).
Among these methods proposed for semi-supervised learning, a family of them can be unified in a Tikhonov regularization scheme in a reproducing kernel Hilbert space (RKHS) with a Mercer kernel , e.g., Belkin and Niyogi (2004), Sindhwani et al. (2005) and Sindhwani and Rosenberg (2008). For the labeled data and the unlabeled data , the solution of the regularization framework usually has the following expression
The semi-supervised algorithms intend to search the coefficients for promising prediction performance. Although they are excellent in the empirical evaluation (Belkin and Niyogi, 2004, Sindhwani et al., 2005, Sindhwani and Rosenberg, 2008), two issues remain to be further addressed in theory:
- •
Computation difficulty. Because the regularized framework generally uses the kernel expansions of all the labeled and unlabeled data, computation becomes a serious problem for a huge set of the unlabeled data in real applications.
- •
Manifold assumption. In many graph-based methods such as Belkin and Niyogi (2004), Sindhwani et al. (2005) and Sindhwani and Rosenberg (2008), it is assumed that the high-dimensional data is relied on a low-dimensional manifold. However, for different types of data, the convincing evidences of the manifold structure are not available (Fan, Gu, Qiao, & Zhang, 2011).
To address the above issues, previous discussions have done to realize sparse semi-supervised learning in Fan et al. (2011), Sun and Shawe Taylor (2010) and Tsang and Kwok (2007) but the limitation is that they just use unlabeled data to construct an additional sparse regularization term.
In this paper, we investigate the sparse representation of the semi-supervised learning without manifold assumption, and consider the sparsity of the semi-supervised learning in data dependent hypothesis spaces. Inspired by the greedy algorithms in Barron, Cohen, Dahmen, and DeVore (2008), Nair, Choudhury, and Keane (2007) and Zhang, 2002, Zhang, 2009, we propose a new sparse greedy algorithm for the semi-supervised learning. Theoretical analysis shows that the proposed algorithm is efficient to realize the sparse learning. Several contributions of this work have been highlighted below:
- •
Our method integrates three different machine learning methods in a coherent way: the sparse semi-supervised learning (Fan et al., 2011, Sun and Shawe Taylor, 2010, Tsang and Kwok, 2007), the greedy algorithm (Nair et al., 2007, Zhang, 2002, Zhang, 2009), and the error analysis in data dependent hypothesis spaces (Shi et al., 2011, Sun and Wu, 2011, Wu and Zhou, 2008, Xiao and Zhou, 2010). We also show how to use them to design and analyze a new semi-supervised algorithm.
- •
Generalization error bounds are derived for nonsymmetric and indefinite kernels. Theoretical results show the relative values of the labeled data and unlabeled data to achieve fast learning rates. In particular, we illustrate that the role of the unlabeled data is twofold. The first one is that the semi-supervised method can achieve fast learning rates using the additionally unlabeled data. The second one is that the learning rates essentially depend on the number of the labeled data even if the number of unlabeled data tends to infinity. Furthermore, our error analysis results rely on weaker conditions than the previous methods which are based on density assumption or manifold assumption in Belkin et al. (2006), Belkin and Niyogi (2004), Chen and Li (2009), Chen et al. (2009), Chen, Li, and Peng (2010), Johnson and Zhang, 2007, Johnson and Zhang, 2008 and Rigollet (2007).
- •
Even for the supervised learning settings, we can achieve faster learning rates than the previous results in Xiao and Zhou (2010), Shi et al. (2011) and Sun and Wu (2011). In particular, our analysis does not require the interior cone condition presented in Shi et al. (2011) and Xiao and Zhou (2010).
The organization of this paper is as follows. Section 2 provides the necessary background of the semi-supervised learning and then presents the sparse semi-supervised greedy algorithm. Section 3 includes the main result on error analysis and its proof is given in Section 4. An empirical study is given in Section 5. We conclude the paper in Section 6.
Section snippets
The sparse semi-supervised greedy algorithm
Let the input space be a compact subset and . In the semi-supervised model, a learner obtains a labeled data set and an unlabeled data set . Here, the labeled examples , are independent copies of the random element having distribution on . The unlabeled data , are independent copies of , whose distribution (the margin distribution of ) is denoted by . The learning goal is to pick up a function to minimize
Main result
Now we introduce a data-free function space similar to Shi et al. (2011) and Xiao and Zhou (2010). Definition 2 Define a data-free assumption function space with the norm
In order to investigate the approximation of to , we introduce a regularizing function where is a regularization parameter.
The regularizing error can be expressed as
The decay of
Error analysis
In this section, we provide the proof of Theorem 1 based on the upper bound analysis of the sample and hypothesis errors. The sample error is bounded by the error analysis method and empirical covering numbers. The hypothesis error is established in terms of theoretical analysis of the greedy algorithm presented in Barron et al. (2008).
An empirical study
Our theoretical analysis of the semi-supervised greedy algorithm (SSG) shows that it is efficient to achieve fast learning rates for the regression learning. In this section, we compare our method with the least square regularized regression (LSR) algorithm in RKHS.
The least square regularized regression algorithm has been extensively studied in learning theory (Cucker & Zhou, 2007) and can be formulated as
We consider , the Gaussian kernel
Conclusion and discussion
This paper has introduced a sparse semi-supervised method to learn the regression functions from samples using the orthogonal greedy algorithm. Fast learning rates were derived under mild assumptions. The symmetric or positive semi-definite demand for kernel and the interior cone condition for (see Shi et al., 2011) is abandoned in this paper. There are some extensions to this method which we discuss as below:
- 1.
The semi-supervised learning based on other greedy algorithms: The proposed method
Acknowledgments
The authors would like to thank the reviewers for their valued comments and suggestions. This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant No. 11001092, No. 11071058, and No. 11226304; by the Fundamental Research Funds for the Central Universities (Program No. 2011PY130), and by the Macau Science and Technology Development Fund (FDCT) under grant 017/2012/A1 and the Research Committee at University of Macau under grants
References (29)
- et al.
Error bounds of semi-supervised multi-graph regularized classifiers
Information Sciences
(2009) - et al.
Semi-supervised learning based on high density regions estimation
Neural Networks
(2010) - et al.
Sparse regualrization for semi-supervised classification
Pattern Recognition
(2011) - et al.
Concentration estimates for learning with -regularizer and data dependent hypothesis spaces
Applied and Computational Harmonic Analysis
(2011) - et al.
Least square regression with indefinite kernels and coefficient regularization
Applied and Computational Harmonic Analysis
(2011) - et al.
Multi-kernel regularized classifiers
Journal of Complexity
(2007) - et al.
Learning with sample dependent hypothesis spaces
Computers & Mathematics with Applications
(2008) - et al.
Approximation and learning by greedy algorithm
Annals of Statistics
(2008) - et al.
Semi-supervised learning on Riemannian manifolds
Machine Learning
(2004) - et al.
Manifold regularizaion: a geometric framework for learning from labeled and unlabeled examples
Journal of Machine Learning Research
(2006)
Semi-supervised learning
Semi-supervised multi-category classification with imperfect model
IEEE Transactions on Neural Networks
Support vector machine soft margin classifiers: error analysis
Journal of Machine Learning Research
Cited by (19)
A distributed semi-supervised learning algorithm based on manifold regularization using wavelet neural network
2019, Neural NetworksCitation Excerpt :But in the real world, labeled data is always difficult or costly to obtain in many practical applications such as image processing (Gong et al., 2016), social media spammer detection (Yu, Chen, Jiang, Fu, & Qin, 2017), driver distraction detection (Liu, Yang, Huang, Yong, & Lin, 2016), online web video topic detection and tracking (Li, Jiang, Zhang, Pang, & Huang, 2016), linguistic and texture processing (Palomera & Figueroa, 2017; Silva, Coletta, Hruschka, & Jr, 2016) and so on. Consequently, semi-supervised learning (SSL) algorithms are widely investigated (Chen, Zhou, Tang, Li, & Pan, 2013; Frasca, Bertoni, Re, & Valentini, 2013; Hady, Schwenker, & Palm, 2010; Kawakita & Takeuchi, 2014; Levatic, Kocev, Ceci, & D?eroski, 2018). The task of SSL is to use additional unlabeled dataset on the basis of labeled samples.
The convergence rate of semi-supervised regression with quadratic loss
2018, Applied Mathematics and ComputationCitation Excerpt :Semi-supervised learning addresses learning by using amount of unlabeled data, together with the labeled data, to build better learning because it requires less human effort and gives higher accuracy (see [34]). To provide theory supports for semi-supervised learning approach, many mathematicians have paid their attentions to the error analysis of the kernel regularized semi-supervised Laplacian learning (see e.g. [3–7,24,34]. Although the framework is complicated, it can actually means that one can accomplish the learning by increasing the unlabeled samples number l.
Kernel-based sparse regression with the correntropy-induced loss
2018, Applied and Computational Harmonic AnalysisCitation Excerpt :Inspired from this study, we propose a sparse regularized regression framework with the C-loss and general kernel function, which is not necessary to be a Mercer kernel. For the kernelized sparse dictionary learning, only part samples are required to construct the predictor, which much improves the algorithm efficiency [22,3,4]. Although the sparse regularization results in the additional difficulty in error analysis, we overcome it in terms of a novel error decomposition and the characteristics of data dependent hypothesis spaces.
Error analysis of regularized least-square regression with Fredholm kernel
2017, NeurocomputingCitation Excerpt :In particular, the labeled data is the key factor on the excess risk without the additional assumption on the marginal distribution. This observation is consistent with the previous analysis for semi-supervised learning [1,4]. In our supervised experiment, Gaussian noise N(0, 0.01) is added to the data respectively.
Example-based super-resolution via social images
2016, NeurocomputingCitation Excerpt :Dictionary based algorithms such as Yang et al. [18], Lu et al. [19] focus on representing the priori between low- and high-resolution training images with the low- and high-resolution dictionary pair. Similarly, the priori on the relation between low- and high-resolution images is represented by a regression function which is learned by supervised or semi-supervised learning methods [20], such as, Ni and Nquyen [21], Kim and Kwon [22], and Tang et al. [23]. Generally, the training time of these explicit priori based algorithms is extremely long when the size of training set is big.
The learning performance of the weak rescaled pure greedy algorithms
2024, Journal of Inequalities and Applications