Nuclear-L1 norm joint regression for face reconstruction and recognition with mixed noise
Introduction
Face recognition is closely related to our life, which has been applied widely to information security, law enforcement and surveillance, smart cards, access control, etc. However, recognizing a face with significant lighting, real disguise and occlusion variations is still a challenging problem in pattern recognition.
Recently, a number of methods have been developed to address this problem. Among them, sparse representation based classifier (SRC) [1] is the most attractive and receiving more and more attention. In fact, SRC can be considered as a generalization of nearest feature classifiers, which strikes a balance between NN [2] and NFS [3]. Differing from these classifiers, the representation of SRC is global, using all the training data as a dictionary, and the classification is performed by checking which class yields the least coding error. Because of its simplicity and effectiveness, SRC has been applied and investigated extensively. In order to further improve the robustness of sparse coding, an extended SRC [4] and some re-weighted L1 minimization algorithms [5], [6] were presented. Zhang et al. [7] have showed that it is the collaborative representation based classification procedure rather than the sparseness that plays the dominant role in face recognition. Yang et al. [8] investigated the role of L1-optimizer and pointed out that for pattern recognition tasks, L1-optimizer provides more meaningful classification information (e.g. closeness) than L0-optimizer does. Meanwhile, integrating sparse coding with other methods is also a meaningful effort. For example, Yang et al. [9] proposed sparse representation classifier steered discriminative projection. Zheng et al. [10] performed SRC in low rank projection with discrimination.
There is no doubt that SRC cannot handle pixel-level noise well without using L1 norm to characterize the error matrix. In general, for describing noise, we first need to consider their distribution. By the maximum likelihood estimation (MLE) theory, we know the characterization of L1-norm or L2-norm is optimal for the error following Laplacian or Gaussian distribution [11]. In practice, however, these two norms cannot fit some errors perfectly caused by structural variations such as occlusions, real disguise, and illumination because they may be far from Gaussian or Laplacian distribution. Thus, SRC, LRC [12], and CRC [7] have some limitations in facing some practical noise. For this issue, Yang et al. [11] presented the modified SRC-based framework to handle outliers such as occlusions in face images by virtue of the idea of robust regression [13]. He et al. [14] proposed the correntropy based sparse representation (CESR) algorithm by using a robust error metric, which was called the correntropy induced metric (CIM). Subsequently, He et al. [15] unified the algorithms for error correction and detection by using the additive and the multiplicative forms respectively, and established a half-quadratic framework for solving the problem of robust sparse representation.
Although all of the above parse representation based methods aim at dealing with both corruption and occlusion, the experiments show that they handle corruption much better than occlusion. Then, some techniques can been applied to face recognition with partial occlusions. For example, Deng et al. [16] first found the best matched patch of occlusion and then completed the face by graph Laplace. Morelli Andrés et al. [17] presented an iterative occlusion detection algorithm using compressive sensing. Li et al. [18] proposed the partially iteratively reweighted sparse coding to accurately detect the occluded region with respect to the whole training set. Min et al. [19] first conducted explicit occlusion analysis and then performed face recognition from the non-occluded regions. Mi et al. [20] first segmented an image into several blocks, where the occluded block is determined by an indicator, then the non-occluded blocks of samples are used as the feature for classification. The common characteristic of these work is to detect the location of occlusion in advance, while they donot consider the spatial information of a face image. Thus, we still call them vector based approaches.
Unlike the above approaches, those parse representation based methods such as RSC, CESR need to characterize the error images, and depend on the assumption that each pixel of noise is independently corrupted. Evidently, some image-level noise does not satisfy this assumption because of the dependence between pixels of this type of noise. Then, Zhou et al. [21] extended [1] by including a Markov Random Fields (MRF) model to enforce spatial continuity for the additive error vector. Jia et al. [22] introduced a class of structured sparsity-inducing norms into the SRC framework to fit the structural noise. Li et al. [23] proposed the structured sparse error coding (SSEC) for face recognition with occlusion by exploring the structure of the error incurred by occlusion from two aspects including the error morphology and the error distribution. Yang et al. [24] used nuclear norm to describe the structural characteristics of error image and proposed a nuclear norm based matrix regression (NMR) model, which has been shown that NMR is robust to face recognition with occlusion and illumination changes. From the viewpoint of dictionary learning, Yang et al. [25] learned a residual map to detect the occlusions by adaptive thresholding, and the face image is identified by masking the detected occlusion pixels from face representation. Analogous to Yang et al. [25], Ou et al. [26] learned a clear dictionary and a noise dictionary simultaneously, and applied the clear dictionary to classification task. In [27], Yang and Zhang employed compressible image Gabor features instead of original image pixels as the feature vector used in SRC to reduce computations in the presence of occlusions.
It is not difficult to find that the above methods can achieve better performance when facing single type structure noise. Nonetheless, for the face with mixed noise, their performance will decrease dramatically since these methods cannot characterize this type of noise. Thus, it is a more challenging task to deal with face recognition with mixed noise. In fact, the research on mixed noise has become a hot topic in image processing in recent years. For example, Xiao et al. [28] combined a median-type filter with an effective dictionary learning method to recover images corrupted by Gaussian plus impulse noise. Liu et al. [29] proposed a weighted dictionary learning model for mixed noise removal, which integrated sparse coding and dictionary learning, image reconstruction, noise clustering and parameters estimation into a four-step framework. Jiang et al. [30] adopted the weighted encoding technique to remove Gaussian noise and impulse noise jointly. Subsequently, Jiang et al. [31] presented a novel mixed noise removal method by proposing a weighted low rank model, where the image global structure and local edges can be well preserved via the low rank model fitting. However, these mixed noise removal models are only suited to dot noise, and need to use the classical approaches of Gaussian noise removal to finish the denoising task. In addition, they carry out the denoising task in the single image without considering other information, thus, these models cannot provide a way for pattern representation (classification).
In this paper, we provide a unified model to deal with mixed noise removal and pattern representation (classification) simultaneously. Here the mixed noise is referred to as structural noise plus sparse noise, which can be characterized from two different viewpoints. On the one hand, we view the mixed noise as a whole, and propose a matrix variate distribution to describe it, which is the linear connection of the generalized matrix variate Slash (G.M.S.) distribution and independent Laplacian (I.L.) distribution. On the other hand, the mixed noise is assumed to be an additive combination of two independent components: structural noise and sparse noise, which are depicted by G.M.S. distribution and I.L. distribution, respectively. Under the above viewpoints, we derive two nuclear-L1 norm joint matrix regression (NL1R) models by using MAP (maximum a posteriori probability estimation). Alternating direction method of multipliers (ADMM) is utilized to solve the proposed models. In order to avoid multiple constraints which may slow down the convergence, we convert the proposed models into the equivalent versions which only include a constraint. In general, the complexity of the proposed Algorithms is much lower than SRC or RSC. We perform experiments on the Extended Yale B database, Multi-PIE database and AR database. The experimental results clearly demonstrate that the proposed method is more effective than state-of-the-art regression methods for face reconstruction and recognition.
This paper extends and improves upon our Asian Conference on Computer Vision (ACCV) paper [32] from the following three aspects. First, we use heavy-tailed matrix distributions to depict mixed noise, which provides the robustness against outliers and emphasizes the dependence between pixels of noise. Second, we present two nuclear-L1 norm joint matrix regression (NL1R) models for face recognition with mixed noise by seeking the MAP solution of the matrix based optimization problems under L1 and L2 regularization. The first model considers the mixed noise as a whole, while the second model assumes the mixed noise is an additive combination of two independent components: sparse noise and structural noise. The work in [32] only investigated a special case of the first model. Third, we provide more experiments for face recognition with mixed noise. Notations: Throughout this paper, the nuclear norm of a matrix X is denoted by, which is the sum of singular values of X; the Frobenius norm of a matrix X is denoted by , which is equal to the L2-norm of (), where is an operator converting a matrix into a vector; The spectral norm of a matrix X is denoted as which is the largest singular value of X.
Section snippets
Nuclear-L1 norm joint regression
In this section, we first introduce a new matrix distribution to describe the structural noise and mixed noise, then seek the MAP (maximum a posteriori probability estimation) solution of the matrix based optimization problem under L2 regularization which is derived by assuming the noise follow the proposed distribution.
The proposed algorithm
The alternating direction method of multipliers (ADMM) or the augmented Lagrange multipliers (ALM) method was presented originally in [39], [40], which has been studied extensively in the theoretical frameworks of Lagrangian functions [41]. Recently, ADMM has been applied to the nuclear norm optimization problems [42], [43], which updates the variables alternately by minimizing the augmented Lagrangian function with respect to the variables in a Gauss-Seidel manner.
The design of the classifier
For the design of classifier, some new ideas should be noted, for example, Luan et al. [51] introduced two descriptors, i.e., sparsity and smoothness, to represent characteristic of the sparse error component, and applied them to face recognition. Li and Lu [52] proposed a new decision rule, i.e., sum of coefficient (SoC) to match better with SRC. That is, they make full use of the information of the objective function.
In this section, we will adopt nuclear norm to design the classifier. This
Experiments and analysis
In this section, three standard face databases including the AR face database, Multi-PIE database and the Extended Yale B database are selected to evaluate the effectiveness and robustness of our algorithms to the structural noise caused by real disguise, occlusion, illumination, sparse noise and some mixed noise. In order to show the performance of our methods, several competitive face recognition methods are tested as comparison, such as CRC, LRC, SRC, RSC, CESR, SSEC and NMR. For the sake of
Conclusions
The characterization of noise is a significant problem in regression model based face recognition. This paper presents two nuclear-L1 norm joint regression models. We assume the mixed noise follows a matrix distribution and seek the MAP (maximum a posteriori probability estimation) solution of the matrix based optimization problems with L1 and L2 regularization of coefficients. Since L1-norm is good at characterizing sparse noise with the Laplace distribution, and nuclear norm is suitable for
Conflict of interest
None declared.
Acknowledgments
This work was partially supported by the National Science Fund for Distinguished Young Scholars under Grant nos. 61125305, 91420201, 61472187, 61233011 and 61373063, the Key Project of Chinese Ministry of Education under Grant no. 313030, the 973 Program No. 2014CB349303, Fundamental Research Funds for the Central Universities No. 30920140121005, and Program for Changjiang Scholars and Innovative Research Team in University No. IRT13072.
Lei Luo received the B.S. degree from Xinyang Normal University, Xinyang, China in 2008, the M.S. degree from Nanchang University, Nanchang, China in 2011. He is currently pursuing the Ph.D. degree in pattern recognition and intelligence system from School of Computer Science and engineering, Nanjing University of Science and Technology, Nanjing, China. His current research interests include pattern recognition and optimization algorithm.
References (56)
- et al.
Beyond sparsity: the role of l1-optimizer in pattern classification
Pattern Recognit.
(2012) - et al.
Face recognition on partially occluded images using compressed sensing
Pattern Recognit. Lett.
(2014) - et al.
A novel method for recognizing face with partial occlusion via sparse representation
Optik—Int. J. Light Electron Opt.
(2013) - et al.
Fast and robust face recognition via coding residual map learning based adaptive masking
Pattern Recognit.
(2014) - et al.
Robust face recognition via occlusion dictionary learning, pattern recognition
Pattern Recognit.
(2014) - et al.
Restoration of images corrupted by mixed Gaussian-impulse noise via l1-l0 minimization
Pattern Recognit.
(2011) - et al.
Mixed noise removal by weighted low rank model
Neurocomputing
(2015) - et al.
The multivariate skew-slash distribution
J. Stat. Plan. Inference
(2006) - et al.
A generalization of the multivariate slash distribution
J. Stat. Plan. Inference
(2009) - et al.
Matrix variate slash distribution
J. Multivar. Anal.
(2015)
Extracting sparse error of robust pca for face recognition in the presence of varying illumination and occlusion
Pattern Recognit.
A new decision rule for sparse representation based classification for face recognition
Neurocomputing
Multi-PIE
Image Vis. Comput.
Robust face recognition via sparse representation
IEEE Trans. Pattern Anal. Mach. Intell.
Nearest neighbor pattern classification
IEEE Trans. Inf. Theory
Face recognition using the nearest feature line method
IEEE Trans. Neural Netw.
Face recognition via weighted sparse representation
J. Vis. Commun. Image Represent.
Enhancing sparsity by reweighted l1 minimization
J. Fourier Anal. Appl.
Sparse representation classifier steered discriminative projection with applications to face recognition
IEEE Trans. Neural Netw. Learn. Syst.
Linear regression for face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Robust regression using iteratively reweighted least-squares
Commun. Stat.: Theory Methods
Maximum correntropy criterion for robust face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
Half-Quadratic-Based Iterative Minimization for Robust Sparse Representation
IEEE Trans. Pattern Anal. Mach. Intell
Graph laplace for occluded face completion and recognition
IEEE Trans. Image Process.
Cited by (0)
Lei Luo received the B.S. degree from Xinyang Normal University, Xinyang, China in 2008, the M.S. degree from Nanchang University, Nanchang, China in 2011. He is currently pursuing the Ph.D. degree in pattern recognition and intelligence system from School of Computer Science and engineering, Nanjing University of Science and Technology, Nanjing, China. His current research interests include pattern recognition and optimization algorithm.
Jian Yang received the B.S. degree in mathematics from the Xuzhou Normal University in 1995. He received the M.S. degree in applied mathematics from the Changsha Railway University in 1998 and the Ph.D. degree from the Nanjing University of Science and Technology(NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza. From 2004 to 2006, he was a Postdoctoral Fellow at Biometrics Centre of HongKong Polytechnic University. From 2006 to 2007, he was a Postdoctoral Fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a professor in the School of Computer Science and Technology of NUST. He is the author of more than 80 scientific papers in pattern recognition and computer vision. His journal papers have been cited more than 3000 times in the ISI Web of Science, and 6000 times in the Web of Scholar Google. His research interests include pattern recognition, computer vision and machine learning. Currently, he is an associate editor of Pattern Recognition Letters and IEEE Transactions on Neural Networks and Learning Systems, respectively.
Jianjun Qian received the B.S. and M.S. degrees in 2007 and 2010, respectively, and the Ph.D. degree in pattern recognition and intelligence systems from Nanjing University of Science and Technology (NUST), in 2014. Now, he is an assistant professor in the School of Computer Science and Engineering of NUST. His research interests include pattern recognition, computer vision and face recognition in particular.
Ying Tai received the B.S. degree in the school of computer science and Engineering from Nanjing University of Science and Technology (NUST), Nanjing, China, in 2012. Currently, he is pursuing the Ph.D. degree in NUST. His current research interests include pattern recognition, computer vision, and especially face recognition.