Elsevier

Pattern Recognition

Volume 46, Issue 6, June 2013, Pages 1579-1594
Pattern Recognition

Complete large margin linear discriminant analysis using mathematical programming approach

https://doi.org/10.1016/j.patcog.2012.11.019Get rights and content

Abstract

In this paper, we develop a novel dimensionality reduction (DR) framework coined complete large margin linear discriminant analysis (CLMLDA). Inspired by several recently proposed DR methods, CLMLDA constructs two mathematical programming models by maximizing the minimum distance between each class center and the total class center respectively in the null space of within-class scatter matrix and its orthogonal complementary space. In this way, CLMLDA not only makes full use of the discriminative information contained in the whole feature space but also overcome the weakness of linear discriminant analysis (LDA) in dealing with the class separation problem. The solutions of CLMLDA follow from solving two nonconvex optimization problems, each of which is transformed to a series of convex quadratic programming problems by using the constrained concave–convex procedure first, and then solved by off-the-shelf optimization toolbox. Experiments on both toy and several publicly available databases demonstrate its feasibility and effectiveness.

Highlights

► A framework based on large margin idea is proposed for discriminant analysis. ► Two mathematical programming based models are constructed respectively. ► The models are used to extract irregular and regular discriminant information. ► The nonconvex optimization problems are converted to convex ones via CCCP.

Introduction

In many real-world applications, e.g. bioinformatics, face identification, we are often faced with high-dimensional data. Directly applying classic pattern recognition methods such as nearest neighborhood classifier, to deal with these data may bring about many problems, e.g. (1) it is generally time-consuming to carry out classification based on all of the original high-dimensional features, (2) large quantities of features may likely to deteriorate the classification performance since many features are irrelevant and redundant for predicting the desired output. By first discovering a low-dimensional subspace where a small number of good features are extracted from the original high-dimensional data and then performing classification in such subspace, we can address these problems by speeding up learning process and improving generalization ability. Therefore, dimensionality reduction (DR) techniques have attracted much attention in pattern recognition and machine learning community during the last few decades. By far, numerous DR methods have been developed, among which Principle component analysis (PCA) [1] and linear discriminant analysis (LDA) [2] are two representative linear subspace learning methods.

In contrast to PCA which does not take into account the class label information and thus is not reliable enough for classification task [3], LDA looks for a low-dimensional subspace by maximizing Fisher criterion, i.e. the ratio of between-class scatter to within-class scatter. Empirical study [4] showed that LDA owns better performance than PCA in face recognition applications and the corresponding method is known as Fisherfaces. Due to its effectiveness, LDA receives intense attention and has been widely used in many real-world applications. However, when the number of samples is much smaller than the dimensionality of sample space, the within-class scatter matrix Sw becomes singular preventing direct application of LDA. This is the well-known small sample size (SSS) problem and may occur in many circumstances. Besides, for multi-class problems, LDA may fail to find good projecting direction since it overemphasizes the class pairs with larger distance in the original sample space and tends to merge the neighboring class pairs after projecting onto a low-dimensional subspace [5], [6]. This so-called class separation problem [7] roots in the fact that LDA tries to maximize the average distance between the centers of different classes [8] as indicated by the definition of between-class scatter matrix Sb. To handle the aforementioned problems of LDA, extensive methods were put forward in the literatures and demonstrated better performance.

To overcome small sample size problem, the conventional methods apply PCA first to generate a low-dimensional subspace in which Sw is no longer singular and then perform LDA in that reduced space to yield discriminant vector, as it was done for example in Fisherfaces [4]. However, the PCA plus LDA strategy is imperfect in that some small principle components are thrown away in the PCA step and hence potentially loses some useful discriminative information. Considering the null space of Sw contains the most important discriminative information for classification if a SSS problem takes place, Chen et al. [9] proposed null space LDA (NLDA) algorithm aiming to seek discriminant vector by directly maximizing the between-class scatter in the null space of Sw. However, when the dimensionality D of sample space is too large, NLDA suffers from high computational complexity since it needs to decompose Sw with size of D×D. Taking into account that the null space of the total scatter matrix St usually contains no discriminative information, Huang et al. [10] seek first a rather small discriminative subspace in the whole null space of Sw by removing the null space of St and then perform NLDA. In doing so, Huang′s method only needs to decompose a matrix whose size is bounded by L−1, where L denotes the total number of samples thus cutting down the computation load of NLDA. Zheng et al. [11] presented a similar approach which further incorporates statistically uncorrelated constraint on the discriminant vector into NLDA such that the extracted features are uncorrelated [12]. However, these null space-based methods [9], [10], [11], [12] only consider discriminative information from the null space of Sw, whereas discarding that contained in its orthogonal complement space. To make full use of the information residing in both subspaces, Yang et al. [13], [14] proposed a complete LDA (CLDA) framework which extracts the irregular discriminant vectors from the null space of Sw as well as regular discriminant vectors from its orthogonal complementary space, respectively. Then these two kinds of discriminant vectors are used jointly for better feature extraction.

To tackle the class separation problem, an intuitive idea is to incorporate a weighting function into Fisher criterion to ensure that the neighboring class pairs in the original sample space have higher weights since they are more likely to be misclassified in the projected space [5], [6], [15], [16]. Among these weighting-based methods, Loog et al. [5] presented a simple but effective criterion named approximate pairwise accuracy criterion (aPAC) that adds weights which are used to approximate the Bayes error for class pairs in the estimation of Sb. However, it is not clear how to select an appropriate function to set the weights. More recently, several new methods [8], [17], [18] have been developed which can avoid the difficult of selecting weighting function. Our previous work [19] demonstrated classification performance can be largely improved by casting an eigen-decomposition based problem as a SVM-type problem [20], [21], [22]. Following this line, RPFDA [17] is further developed by converting the maximum average between-class scatter criterion in LDA into a set of SVM-type constraints. Unfortunately, RPFDA suffers from the singularity problem and resorts to the PCA plus LDA strategy. Xu et al. [8] proposed a method called minimum distance maximization (MDM). MDM applies first a whitening transformation on the samples to yield a subspace wherein the projected Sw becomes an identity matrix. Then, it searches for discriminant vectors by maximizing the minimum pairwise distance between the centers of different classes. A similar method is also independently developed in [18]. Although these methods [8], [17], [18] can handle the class separation problem well, they require performing PCA a prior to reduce the computational complexity which may loss some useful discriminative information.

In this paper, motivated by the success of CLDA [13], [14] and RPFDA [17], we propose systematically a novel supervised DR framework, coined complete large margin linear discriminant analysis or CLMLDA for short. We call our method CLMLDA because of the following two reasons. Firstly, CLMLDA is capable to make full use of the discriminative information in and out of the null space of Sw and thus does not lead to any loss of useful information. Secondly, we introduce the idea of large margin to construct CLMLDA model. In such a way, the conventional eigen-decomposition based methods are reformulated as associated SVM-type optimization problems, which in turn bring about numerous advantages. It is interesting of our method from the following perspectives:

  • (1)

    Comparing with those null space based discriminant analysis methods [5], [9], [10], [11] which maximize the (weighted) average distance between the centers of different classes in the null space of Sw, CLMLDA attempts to maximize the minimum distance between the centers of each class and total classes. This is consistent with the large margin principle which exhibits impressive generalization performance in the well-known SVM [21], [22]. By doing so, the derived discriminant vectors may own better discriminative power and are expected to be insensitive to the statistical distribution of data.

  • (2)

    Comparing with RPFDA [17] which directly enlightens our work, the formulation of CLMLDA is more straightforward than RPFDA. To remove the limitation on the number of discriminant vectors obtained, RPFDA needs to generate new features via discarding the information represented by the old features in advance. It is like to result in singularity problem and hence applies PCA to tackle it. Different from RPFDA, we impose directly the orthogonality constraint on the set of existing discriminant vectors which is essentially similar to the classic orthonormal FLD [23], [24]. As a result, CLMLDA avoids the singularity problem and does not require performing PCA in each iteration.

  • (3)

    Comparing with those existing mathematical programming based DR methods [8], [17], [18] which do not give sufficient consideration to the discriminative information contained in the null space of Sw, our proposed CLMLDA can make full use of two kinds of discriminative information, e.g. irregular and regular ones, which respectively lie in and out of the null space of Sw. It makes CLMLDA a more powerful discriminant analysis method. In addition, the optimization models in Ref. [8], [18] are generally based on semidefinite program (SDP) [39] (except MDM which can be solved by quadratic programming) whereas CLMLDA is based on quadratic programming problem (QPP) which is simple and easy to implement.

  • (4)

    Last but not least, the new formulation of CLMLDA demonstrates the feasibility for the introduction of large margin principle into the field of DR as well as the advantages of mathematical programming based DR method. For example, although there exist some methods which can transform the generalized eigenvalue problems of DR into equivalent least squares problems [25], [26] and facilitate the introduction of regularization techniques [27], the proposed method provides another possibility and calculation procedure from a new point of view.

The paper is organized as follows. In Section 2, we briefly review the basic LDA and NLDA methods. Then, we introduce the idea of large margin and propose CLMLDA framework in Section 3. Section 4 reports the experimental results on both artificial and benchmark databases. Finally, we draw a conclusion to this paper in Section 5.

Section snippets

Fundaments of Fisher LDA and null space based LDA

Suppose there are K pattern classes ω1,ω2,,ωK in D-dimensional input space. The number of samples in class ωi is Li (i=1,2,,K) and let L=i=1KLi be the total number of samples. The null space and its orthogonal complementary of a symmetric positive semidefinite (PSD) matrix S are respectively denoted by N(S) and N(S). The purpose of discriminant analysis is to find an optimal set of discriminant vectors such that the samples are most well-separated after they are projected onto these

Complete large margin linear discriminant analysis

In this section, we first introduce the idea of large margin into NLDA to extract irregular discriminant vectors from N(Sw). Subsequently, we propose a similar algorithm to extract regular discriminant vectors from N(Sw). These two kinds of discriminant vectors are finally fused to form a complete large margin linear discriminant analysis.

Experimental results and analysis

In this section, we first use a synthesized toy example to intuitively demonstrate the behavior of the proposed CLMLDA. Then, we investigate the influence of the parameters on the performance of CLMLDA on Yale face databases. Subsequently, we compare the performance of CLMLDA with several representative dimensionality reduction methods, including PCA [3], LDA [4], aPAC [5] MDM [8], RPFDA [17], NLDA [10] and CLDA [13] on four databases, namely, Yale, PIE, USPS and COIL-20, which are widely used

Conclusions and feature works

In this paper, we develop a novel DR method termed as complete large margin linear discriminant analysis (CLMLDA). In order to handle the singularity and class separation problem in LDA, we introduce the idea of large margin and construct two mathematical programming based models respectively in the null space of the within-class scatter matrix and its orthogonal complementary space in order to extract irregular and regular discriminant vectors. To solve the proposed non-convex optimization

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. They also thank Prof. Suganthan, Wankou Yang, and Qiaolin Ye for providing some useful codes. This work was partially supported by the Program for New Century Excellent Talents in University of China, the National Science Foundation of China under Grant nos. 61203244, 60973098 and 51108209, the National Science Fund for Distinguished Young Scholars under Grant no.

Xiaobo Chen received the BS degree in computer science and MS degree in pattern recognition and intelligence systems both from Jiangsu University in 2004 and 2007, respectively. From September, 2009, he has been a Ph.D. student, Department of Computer Science Nanjing University of Science and Technology (NUST). From March to August, 2011, he served as a research assistant at Department of Computing, the Hong Kong Polytechnic University. Now, he is a lecturer at Jiangsu University. He has

References (43)

  • Y. Xu et al.

    A novel method for Fisher discriminant analysis

    Pattern Recognition

    (2004)
  • W. Yang et al.

    A multi-manifold discriminant analysis method for image feature extraction

    Pattern Recognition

    (2011)
  • W. Yang et al.

    Feature extraction based on laplacian bidirectional maximum margin criterion

    Pattern Recognition

    (2009)
  • I. Jolliffe

    Principal component analysis

    Encyclopedia of Statistics in Behavioral Science

    (2002)
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • M.A. Turk, A.P. Pentland, Face recognition using eigenfaces, in: Proceedings of CVPR, 1991, pp....
  • P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: recognition using class specific linear projection,...
  • M. Loog et al.

    Multiclass linear dimension reduction by weighted pairwise Fisher criteria

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • R. Lotlikar et al.

    Fractional-step dimensionality reduction

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2000)
  • D. Tao et al.

    Geometric mean for subspace selection

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2009)
  • B. Xu, K. Huang, C.L. Liu, Dimensionality reduction by minimal distance maximization, in: Proceedings of ICPR, 2010,...
  • Cited by (26)

    • Optimization problems for machine learning: A survey

      2021, European Journal of Operational Research
    • A weighted sparse neighbor representation based on Gaussian kernel function to face recognition

      2018, Optik
      Citation Excerpt :

      In the past few decades, many researchers have proposed some face recognition algorithms. The most typical method is the Principal Component Analysis algorithm (PCA) [5] and Linear Discriminant Analysis algorithm (LDA) [6–7]. The PCA and LDA algorithm are used for dimension reduction; it can achieve great effect in face recognition.

    • Non-greedy Max–min Large Margin based on L1-norm

      2018, Pattern Recognition Letters
      Citation Excerpt :

      Recently, some dimensionality reduction methods based on maximum and minimum ideas have been applied to some scenarios and have shown good performance. Chen et al. proposed the complete maximum marginal linear discriminant analysis (CLMLDA) [5], which achieves an optimal projection vector by maximizing the minimum distance between class-specific mean and total mean. Therefore, CLMLDA not only makes use of the discriminant information in the global feature space, but also improves the weakness of the LDA for dealing with multi-class problems.

    • Regularized max-min linear discriminant analysis

      2017, Pattern Recognition
      Citation Excerpt :

      It is well known that principal component analysis (PCA) [4] and linear discriminant analysis (LDA) [13,14] are classic and representative methods for the two types of techniques, respectively. Recently, the max-min idea has been applied to several scenarios in order to handle the class separation problem [16,37–42]. Zhang and Yeung gave worst-case linear discriminant analysis (WLDA) to improve classification performance [38].

    • Approximately symmetrical face images for image preprocessing in face recognition and sparse representation based classification

      2016, Pattern Recognition
      Citation Excerpt :

      Many efforts have been made to address these difficulties from the viewpoint of methodology. For example, when linear discriminant analysis (LDA) [6,8] is applied to face recognition, it usually suffers from the SSS problem. In other words, because the number of face samples for training is almost always smaller than the dimension of the face sample, LDA cannot be directly implemented.

    View all citing articles on Scopus

    Xiaobo Chen received the BS degree in computer science and MS degree in pattern recognition and intelligence systems both from Jiangsu University in 2004 and 2007, respectively. From September, 2009, he has been a Ph.D. student, Department of Computer Science Nanjing University of Science and Technology (NUST). From March to August, 2011, he served as a research assistant at Department of Computing, the Hong Kong Polytechnic University. Now, he is a lecturer at Jiangsu University. He has published several technical papers in the area of pattern recognition, machine learning and multi-agent systems. He serves as a reviewer for several international journals, including IEEE Transactions on Neural Networks, Pattern Recognition, Neural Networks etc. and won the best student paper award in the 4th Chinese Conference on Pattern Recognition (CCPR2010).

    Jian Yang received the BS degree in mathematics from the Xuzhou Normal University in 1995. He received the MS degree in applied mathematics from the Changsha Railway University in 1998 and the PhD degree from the Nanjing University of Science and Technology (NUST), on the subject of pattern recognition and intelligence systems in 2002. In 2003, he was a postdoctoral researcher at the University of Zaragoza, and in the same year, he was awarded the RyC program Research Fellowship sponsored by the Spanish Ministry of Science and Technology. From 2004 to 2006, he was a Postdoctoral Fellow at Biometrics Centre of Hong Kong Polytechnic University. From 2006 to 2007, he was a Postdoctoral Fellow at Department of Computer Science of New Jersey Institute of Technology. Now, he is a professor in the School of Computer Science and Technology of NUST. He is the author of more than 50 scientific papers in pattern recognition and computer vision. His journal papers have been cited more than 1200 times in the ISI Web of Science, and 2000 times in the Web of Scholar Google. His research interests include pattern recognition, computer vision and machine learning. Currently, he is an associate editor of Pattern Recognition Letters and IEEE Transactions on Neural Networks and Learning Systems, respectively.

    David Zhang graduated in computer science from Peking University, China, in 1974, received the M.Sc. degree in computer science and engineering and the Ph.D. degree from the Harbin Institute of Technology (HIT), China, in 1983 and 1985, respectively. In 1994, he received a second Ph.D. degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada. From 1986 to 1988, he was first a Postdoctoral Fellow at Tsinghua University, China, and then an Associate Professor at the Academia Sinica, Beijing, China. Currently, he is a Chair Professor at the Hong Kong Polytechnic University where he is the Founding Director of the Biometrics Technology Centre (UGC/CRC) supported by the Hong Kong SAR Government. He also serves as Adjunct Professor at Tsinghua University, Shanghai Jiao Tong University, Harbin Institute of Technology, Beihang University, and the University of Waterloo. His research interests include automated biometrics-based authentication, pattern recognition, and biometric technology and systems. As a principal investigator, since 1980, he has brought to fruition many biometrics projects and won numerous prizes. Dr. Zhang is the Founder and Editor-in-Chief of the International Journal of Image and Graphics (IJIG); editor of the Kluwer International Series on Biometrics (KISB); Chairman, Hong Kong Biometric Authentication Society and Program Chair; the First International Conference on Biometrics Authentication (ICBA); Associate Editor of more than ten international journals including the IEEE Transactions on Systems, Man, and Cybernetics—part A, the IEEE Transactions on Systems, Man, and Cybernetics—part C, Pattern Recognition; and he is the author of more than 140 journal papers, 20 book chapters, and 11 books. In 1984, his Fingerprint Recognition System won the National Scientific Council of China's third prize and in 1986 his Real-Time Remote Sensing System took the Council's first prize. In 2002, his Palmprint Identification System won a Silver Medal at the Seoul International Invention Fair, following a Special Gold Award in 2003, a Gold Medal, and a Hong Kong Industry Award. Professor Zhang holds a number of patents in both the USA and China and is a Croucher Senior Research Fellow and Distinguished Speaker of IEEE Computer Society.

    Jun Liang received the BS degree in Computer Science from Southeast University for Nationalities, Chengdu, Sichuan, China, in 1999, and MS Degree in Computer Science from Jiangsu University, Zhenjiang, China, in 2009. Now, he is a PhD student in the School of Automobile and Traffic Engineering at Jiangsu University. He is also an Associate Professor in Automotive Engineering Research Institute at Jiangsu University. He has published more than 10 scientific papers in Multi-Agent Systems and machine learning. His current interests are in the area of Multi-Agent Systems machine learning and vehicle engineering.

    View full text