2DPCA with L1-norm for simultaneously robust and sparse modelling
Introduction
Dimensionality reduction (DR) is of great importance for multivariate data analysis. For classifying typically high-dimensional patterns in practice, DR can relieve the “curse of dimensionality” effectively (Jain, Duin, & Mao, 2000). Principal component analysis (PCA) (Jolliffe, 1986) is perhaps the most popular DR technique. It seeks a few basis vectors such that the variances of projected samples are maximized. In the domain of image analysis, two-dimensional PCA (2DPCA) (Yang, Zhang, Frangi, & Yang, 2004) is more efficient, due to its direct formulation based on raw two-dimensional images.
Although PCA and 2DPCA have been widely applied in many fields, they are vulnerable at the presence of atypical samples because of the employment of the L2-norm in the variance formulation. As a robust alternative, L1-norm-based approaches were developed. Specifically, the L1-norm-based PCA variants include L1-PCA (Ke & Kanade, 2005), R1-PCA (Ding, Zhou, He, & Zha, 2006), PCA-L1 (Kwak, 2008), and non-greedy PCA-L1 (Nie, Huang, Ding, Luo, & Wang, 2011). Li, Pang, and Yuan (2009) developed the L1-norm-based 2DPCA (2DPCA-L1), which demonstrated encouraging performance for the image analysis.
A limitation of the above methods is that the basis vectors learned are still dense, which makes it difficult to explain the resulting features. It is desirable to select the most relevant or salient elements from a large number of features. To address this issue, sparse modelling has been developed and received increasing attention in the community of pattern classification (Wright et al., 2010). The sparsity was achieved by regularizing objective variables with a lasso penalty term using the L1-norm (Chen et al., 1998, Tibshirani, 1996). Mathematically, the classic PCA approach could be reformulated as a regression-type optimization problem, and then the sparsity-inducing lasso penalty was imposed, resulting in sparse PCA (SPCA) (Zou, Hastie, & Tibshirani, 2006). The sparsity was further generalized to structured version, producing structured sparse PCA (Jenatton, Obozinski, & Bach, 2010). With the graph embedding platform (Yan et al., 2007), various DR approaches were endowed with a unified sparse framework by the L1-norm penalty (Cai et al., 2007, Wang, 2012, Zhou et al., 2011). Recently, the robustness of SPCA was improved by the L1-norm maximization (Meng, Zhao, & Xu, 2012).
The sparse modelling for 2DPCA-L1, however, is still not addressed. Note that the L1-norm used in 2DPCA-L1 works as a robust measure of sample dispersion rather than regularizing basis vectors. A common way of enforcing sparsity is to fix the L2-norm and minimize the L1-norm with a length constraint.
In this paper, we limit our attention to the image analysis, and consider extending 2DPCA-L1 with sparsity, referred to as 2DPCAL1-S. On account of the L1-norm used as the lasso penalty in the sparsity-inducing modelling, we propose incorporating the L1-norm lasso penalty, together with the fixed L2-norm, onto the basis vectors of 2DPCA-L1. Consequently, 2DPCAL1-S maximizes the L1-dispersion of samples subject to the elastic net (i.e., L2-norm and L1-norm) (Zou et al., 2006) constraint onto the basis vectors. Formally, we combine the L1-dispersion and the elastic net constraint onto the objective function. As can be seen, we use the L1-norm for both robust and sparse modelling simultaneously. Due to the involvement of the L1-norm in the two aspects, the optimization of 2DPCAL1-S is not straightforward. We design an elegant iterative algorithm to solve 2DPCAL1-S.
The remainder of this paper is organized as follows. The conventional 2DPCA-L1 method is briefly reviewed in Section 2. The formulation of 2DPCAL1-S is proposed in Section 3. Section 4 reports experimental results. And Section 5 concludes the paper.
Section snippets
Brief review of 2DPCA-L1
The 2DPCA-L1 approach, proposed by Li et al. (2009), finds basis vectors that maximize the dispersion of projected image samples in terms of the L1-norm. Suppose that are a set of training images with size , where is the number of images. These images are assumed to be mean-centred.
Let be the first basis vector of 2DPCA-L1. It maximizes the L1-norm-based dispersion of projected samples subject to , where and denote the L1-norm and the L2-norm,
Basic idea
Sparse modelling has been receiving exploding attention in computer vision and pattern classification (Wright et al., 2010). The obtained basis vectors of 2DPCA-L1, however, are still dense (Li et al., 2009). In other words, the projection procedure involves all the original features. As we know, a typical image usually has a large number of features. There may exist irrelevant or redundant features for classification. It is important to find a few salient features, which correspond to specific
Experiments
In order to evaluate the proposed 2DPCAL1-S algorithm, we compare its performances of image classification and reconstruction with four unsupervised learning algorithms: PCA, PCA-L1, 2DPCA, and 2DPCA-L1. Two benchmark face databases FERET and AR are used in our experiments.
In the experiments, the initial components of PCA-L1 are set as the corresponding components of PCA. The initial components of 2DPCA-L1 and 2DPCAL1-S are set as the corresponding components of 2DPCA.
There are two tuning
Conclusion
A new subspace learning method, called 2DPCAL1-S, is developed for image analysis in this paper. It uses the L1-norm for both robust and sparse modelling. The role of the L1-norm is two-fold. One is the robust measurement of the dispersion of samples, as in 2DPCA-L1. The other is to introduce penalty, resulting in the sparse projection vectors. 2DPCAL1-S utilizes the feature extraction and the feature selection simultaneously and robustly. Computationally, an iterative algorithm is designed,
Acknowledgements
The authors would like to thank the anonymous referees for the constructive recommendations, which greatly improve the paper. This work was supported in part by the National Natural Science Foundation of China under Grants 61075009 and 31130025, in part by the Natural Science Foundation of Jiangsu Province under Grant BK2011595, in part by the Program for New Century Excellent Talents in University of China under Grant NCET-12-0115, and in part by the Qing Lan Project of Jiangsu Province.
References (18)
- et al.
Improve robustness of sparse PCA by L1-norm maximization
Pattern Recognition
(2012) Structured sparse linear graph embedding
Neural Networks
(2012)- et al.
Two-dimensional PCA: a new approach to appearance-based face representation and recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2004) - Cai, D., He, X., & Han, J. (2007). Spectral regression: a unified approach for sparse subspace learning. In Proceedings...
- et al.
Atomic decomposition by basis pursuit
SIAM Journal on Scientific Computing
(1998) - Ding, C., Zhou, D., He, X., & Zha, H. (2006). R1-PCA: rotational invariant L1-norm principal component analysis for...
- et al.
Statistical pattern recognition: a review
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2000) - Jenatton, R., Obozinski, G., & Bach, F. (2010). Structured sparse principal component analysis. In Proceedings of the...
Principal component analysis
(1986)