Elsevier

Pattern Recognition

Volume 92, August 2019, Pages 258-273
Pattern Recognition

Joint graph optimization and projection learning for dimensionality reduction

https://doi.org/10.1016/j.patcog.2019.03.024Get rights and content

Highlights

  • A novel framework termed joint graph optimization and projection learning (JGOPL) is proposed for graph-based dimensionality reduction.

  • The l21-norm based distance measurement is adopted in the loss function of our JGOPL so that its robustness to the negative influence caused by the outliers or variations of data can be improved.

  • In order to well exploit and preserve the local structure information of high-dimensional data, a locality constraint is introduced into the proposed JGOPL to discourage a sample from connecting with the distant samples during graph optimization.

  • The locality constraint and graph optimization strategy proposed is not only limited to dimensionality reduction, but also can be incorporated into other relevant graph-based tasks.

Abstract

Nowadays, graph-based dimensionality reduction approaches have become more and more popular due to their successful utilization for classification and clustering tasks. In these approaches, how to establish an appropriate graph is critical. To address this issue, a novel graph-based dimensionality reduction framework termed joint graph optimization and projection learning (JGOPL) is proposed in this paper. Compared with existing dimensionality reduction approaches, there are three main advantages of JGOPL. First, through performing the graph optimization and low-dimensional feature learning simultaneously, our proposed approach can accomplish the tasks of graph construction and dimensionality reduction jointly. Second, the l21-norm based distance measurement is adopted in the loss function of our JGOPL so that its robustness to the negative influence caused by the outliers or variations of data can be improved. Third, in order to well exploit and preserve the local structure information of high-dimensional data, a locality constraint is introduced into the proposed JGOPL to discourage a sample from connecting with the distant samples during graph optimization. Extensive classification and clustering experiments are carried out on seven publicly available databases to demonstrate the effectiveness of our approach. At last, the locality constraint and graph optimization strategy proposed in this paper is not only limited to dimensionality reduction, but also can be incorporated into other relevant graph-based tasks (such as spectral clustering).

Introduction

The dimensionality of data in scientific fields such as pattern recognition and machine learning is always high, which not only causes the “curse of dimensionality” problem, but also bring noise and redundancy to reduce the effectiveness of algorithms [1], [2], [3], [4], [5], [6], [7], [8], [9], [10]. Therefore, it is significant to extract the most useful low-dimensional representation from original high-dimensional data by dimensionality reduction techniques.

In the past several decades, a number of dimensionality reduction algorithms have been proposed, which can be generally divided into linear and nonlinear categories [11], [12], [13], [14], [15]. The linear dimensionality reduction algorithms, which employ the linear transformation to project original high-dimensional data into a low-dimensional subspace, have been well studied. The most classical and widely used linear dimensionality reduction algorithms include principle component analysis (PCA) [16], linear discriminate analysis (LDA) [17], independent component analysis (ICA) [18], canonical correlation analysis (CCA) [19], bayesian principal component analysis (BPCA) [20], local fisher discriminant analysis (LFDA) [21] and partial least squares (PLS) [22]. Although they are simple and efficient, a common limitation of them is that they ignore the nonlinear structure of original high-dimensional data. To deal with this issue, some kernelized algorithms were proposed, such as kernel principal component analysis (KPCA) [23], kernel entropy component analysis (KECA) [24], kernel scatter-difference-based discriminant analysis (KSDA) [25], kernel marginal fisher analysis (KMFA) [11], kernel local fisher discriminant analysis (KLFDA) [26], kernel maximum margin criterion (KMMC) [27] and interactive document map (IDMAP) [28]. Nevertheless, due to these kernelized algorithms introduce the kernel trick into dimensionality reduction, it is difficult to properly set the parameter values in their kernel functions. Since 2000, a series of manifold learning methods have been presented to take the nonlinear manifold structure of high-dimensional data into consideration. The typical manifold learning algorithms include locally linear embedding (LLE) [29], laplacian eignmap (LE) [30], isometric feature mapping (ISOMAP) [31] and so on. Although the manifold learning algorithms can preserve the intrinsic manifold structure of input data during dimensionality reduction, these approaches can only obtain the low-dimensional features of training data while fail to provide an explicit mapping function between the high and low dimensional spaces. Therefore, they may suffer from the “out-of-sample” problem [32], [33], [34]. To mitigate this limitation, a series of linearized manifold learning approaches in which a projection matrix can be learned as a mapping function to obtain the low-dimensional features of high-dimensional data have been proposed, e.g., neighborhood preserving embedding (NPE) [32], locality preserving projection (LPP) [33] and local discriminant embedding (LDE) [34]. Since most of aforementioned dimensionality reduction approaches and their extensions are based on a graph embedding framework [11], the performances of them heavily rely on the graph construction process. However, how to construct an optimal graph is a difficult problem in practice. For example, k nearest neighbor graph (kNN-graph) and ɛ-graph are regarded as classical graph construction strategies and have been widely used in graph-based dimensionality reduction approaches, e.g., LE, LLE, NPE, and LPP. Nevertheless, a major disadvantage of kNN-graph and ɛ-graph is that the values of neighbor parameters are required to be set empirically and the inappropriate parameter settings will inevitably deteriorate their performances. To address this disadvantage, some adaptive graph construction approaches such as sample-dependent graph (SG) [35], l1-graph [36], [37], low-rank representation graph (LRR-graph) [38] and least-squares regression graph (LSR or l2-graph) [39], [40] have been employed in dimensionality reduction. Although these approaches can relieve the disadvantage of kNN-graph and ɛ-graph, a common criticism of them is that their graph construction processes are task-irrelevant. Thus, the graphs obtained by them may not be optimal for dimensionality reduction [41], [42], [43], [44], [45], [46]. More recently, Zhang et al. [43] combined graph construction and projection matrix learning in one single objective function and proposed a graph-optimized locality preserving projections (GoLPP) algorithm. The experimental results in [43] indicated that the performance of GoLPP is superior to LPP in which the classical graph construction technique, i.e., kNN-graph or ɛ-graph, is adopted. Nevertheless, GoLPP failed to take the information of original data into consideration and the entropy-based regularization in it cannot guarantee the sparsity of the obtained graph. To remedy the shortcomings of GoLPP, Qiao et al. [44] took the original data information into account and proposed an algorithm termed as dimensionality reduction with adaptive graph (DRAG). DRAG first constructs a predefined graph based on original high-dimensional data. Then, an elegant graph can be obtained by fusing the original and transformed data information together. Zhang et al. [45] also proposed an algorithm termed graph optimization for dimensionality reduction with sparsity constrains (GODRSC) to improve the sparsity of optimized graph. Different from GoLPP and DRAG, GODRSC incorporated the l1-nom based regularization into its objective function to achieve the sparse graph construction and low-dimensional feature learning jointly. Therefore, it obtained better performance than GoLPP. However, since GoLPP, DRAG and GODRSC all employed the l2-norm or Frobenius-norm based metric to characterize the scatter of the data, their dimensionality reduction performances are sensitive to the outliers or the variations of data in a certain sense [47], [48], [49], [50], [51]. Inspired by the robustness of l21-norm, Wong et al. [47] proposed a low-rank embedding (LRE) by introducing the l21-norm based low-rank representation for dimensionality reduction. Despite of LRE is robust to the data containing noise or outliers, the local information of data is neglected in it. Moreover, the optimization process of low-rank constraint is time consuming. To accurately exploit the structure of high-dimensional data, Fang et al. [51] took the locality of data into consideration and proposed an orthogonal self-guided similarity preserving projection (OSSPP) method in which the low-dimensional features and intrinsic structure of input data can be learned simultaneously. However, since OSSPP adopted the l2-norm based metric to describe the data, it lacks robustness to the outliers and noise. Besides, the optimization process of OSSPP is also relatively complicated and time consuming.

To overcome the aforementioned issues, this paper presents a joint graph optimization and projection learning (JGOPL) algorithm for dimensionality reduction. Different from the existing graph-based dimensionality reduction approaches, JGOPL incorporates the graph construction into its objective function, thus the projection matrix and the adaptive graph can be simultaneously optimized. Meanwhile, in order to improve the robustness of our model to the variations of the data or outliers, we employ the l21-norm based distance measurement for the loss function. Furthermore, with the use of the similarity of input data, i.e., the locality constraints, the local structure information of data can be preserved well. Extensive experiments are carried out on seven publicly available databases in this paper. Seen from the experimental results, the proposed approach outperforms the existing related approaches, which indicates its effectiveness.

Fig. 1 depicts the flowchart of the proposed JGOPL, and the remainder of this paper is organized as below: Section 2 briefly reviews two related works. Section 3 describes the proposed JGOPL approach. More detailed experimental results are given in Section 4, aiming to evaluate the performance of the proposed approach. Finally, conclusions are depicted in Section 5.

Section snippets

Graph-optimized locality preserving projections

Let X=[x1,x2,,xn]RD×n be a data matrix which contains n input samples in D dimensional space. Different from most of graph-based algorithms [42] which require a predefined graph before dimensionality reduction, graph-optimized locality preserving projections (GoLPP) algorithm [43] performs graph optimization during a specific dimensionality reduction process, i.e., LPP. Its objective function isminP,Wi,jnPTxiPTxj22wiji=1nPTxi22+ηi,j=1nwijlnwijs.t.j=1nwij=1,wij0where P ∈ RD × d(d

Definitions of different norms

Given a matrix Z=[zij]Rn1×n2, where n1 and n2 represent the number of rows and columns in Z, Zi denotes the i-th row of Z. The Frobenius-norm of matrix Z is represented asZ||F=i=1n1j=1n2zij2=i=1n1Zi22where ·22 is the squared l2-norm.

Seen from Eq. (3), the sensitivity of Frobenius-norm comes from the squared operation since Zi22 with larger values will dominate the final result. Different from Eq. (3), l1-norm and l21-norm are respectively defined as Eq. (4) and Eq. (5) [71].Z||1=i=1

Experiments

In this section, a series of experiments are carried out on seven publicly available databases to verify the effectiveness of the proposed method for image classification and clustering tasks. Some state-of-the-art approaches (LPP [33], NPE [32], SGLPP [35], LSR-NPE [39], LRR-NPE [38], SPP [37], GoLPP [43], DRAG [44], GODRSC [45], LRE [47] and OSSPP [51]) are employed for comparison. Among these algorithms, LPP and NPE are two classical graph-based dimensionality reduction algorithms which

Conclusion and future work

In this paper, we propose a robust linear dimensionality reduction approach named Joint Graph Optimization and Projection Learning (JGOPL). JGOPL integrates the projection matrix learning and graph optimization into one unified objective function. Therefore, the graph in the proposed approach can be adaptively updated rather than predefined during the procedure of dimensionality reduction. Meanwhile, the l21-norm based distance measurement makes our approach be robust to outliers or the

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grants 61602221, 61672150, 61702092, 61806126, 61562044 and 41661083, the Natural Science Foundation of Jiangxi Province under Grant 20171BAB212009, the Science and Technology Research Project of Jiangxi Provincial Department of Education under Grants GJJ160333, GJJ170234 and GJJ160315, the Fund of Jilin Province Science and Technology Development Project under Grants 20170204018GX, 20180201089GX and 20190201305JC,

Yugen Yi was born in Pingxiang, China. He received the B.S. degree from the College of Humanities and Sciences, Northeast Normal University, China, in 2009, the M.S. degree from the College of Computer Science and Information Technology, Northeast Normal University, in 2012, and the Ph.D. degree from the School of Mathematics and Statistics, Northeast Normal University, in 2015. He is currently a Lecturer with the School of Software, Jiangxi Normal University. His research interests include

References (71)

  • T. Liu et al.

    An adaptive graph learning method based on dual data representations for clustering

    Pattern Recognit.

    (2018)
  • L. Qiao et al.

    Data-driven Graph Construction and Graph Learning: a Review

    Neurocomputing

    (2018)
  • L. Zhang et al.

    Graph-optimized locality preserving projections

    Pattern Recognit.

    (2010)
  • L. Zhang et al.

    Graph optimization for dimensionality reduction with sparsity constraints

    Pattern Recognit.

    (2012)
  • Z. Lai et al.

    Robust jointly sparse embedding for dimensionality reduction

    Neurocomputing

    (2018)
  • X. Fang et al.

    Orthogonal self-guided similarity preserving projection for classification and clustering

    Neural Netw.

    (2017)
  • W. Liu et al.

    KCRC-LCD: discriminative kernel collaborative representation with locality constrained dictionary for visual categorization

    Pattern Recognit.

    (2015)
  • P.O. Hoyer

    Modeling receptive fields with non-negative sparse coding

    Neurocomputing

    (2003)
  • W.H. Hsaio et al.

    Locality-constrained max-margin sparse coding

    Pattern Recognit.

    (2017)
  • Q. Wang et al.

    Locality constraint distance metric learning for traffic congestion detection

    Pattern Recognit.

    (2018)
  • D. Tolić et al.

    A nonlinear orthogonal non-negative matrix factorization approach to subspace clustering

    Pattern Recognit.

    (2018)
  • Y. Yi et al.

    Ordinal preserving matrix factorization for unsupervised feature selection

    Signal Process.

    (2018)
  • Z. Lai et al.

    Robust discriminant regression for feature extraction

    IEEE Trans. Cyber.

    (2018)
  • W. Wang et al.

    Flexible manifold learning with optimal graph for image and video representation

    IEEE Trans. Image Process.

    (2018)
  • Z. Jin et al.

    EEG classification using sparse Bayesian extreme learning machine for brain–computer interface

    Neural Comput. Appl.

    (2017)
  • H. Wang et al.

    Discriminative feature extraction via multivariate linear regression for SSVEP-based BCI

    IEEE Trans. Neural Syst. Rehabil. Eng.

    (2016)
  • G. Zhou et al.

    Linked component analysis from matrices to high-order tensors: applications to biomedical data

    Proc. IEEE

    (2016)
  • S. Yan et al.

    Graph embedding and extensions: a general framework for dimensionality reduction

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • Y. Wang et al.

    A perception-driven approach to supervised dimensionality reduction for visualization

    IEEE Trans. Visual. Comput. Graph.

    (2018)
  • H. Cai et al.

    A comprehensive survey of graph embedding: problems, techniques and applications

    IEEE Trans. Know. Data Eng.

    (2018)
  • I. Jolliffe

    Principal component analysis

    International Encyclopedia of Statistical Science

    (2011)
  • R.O. Duda et al.

    Pattern Classification

    (2012)
  • D.R. Hardoon et al.

    Canonical correlation analysis: an overview with application to learning methods

    Neural Comput.

    (2004)
  • C.M Bishop

    Bayesian pca

  • M. Sugiyama

    Dimensionality reduction of multimodal labeled data by local fisher discriminant analysis

    J. Mach. Learn. Res.

    (2007)
  • Cited by (0)

    Yugen Yi was born in Pingxiang, China. He received the B.S. degree from the College of Humanities and Sciences, Northeast Normal University, China, in 2009, the M.S. degree from the College of Computer Science and Information Technology, Northeast Normal University, in 2012, and the Ph.D. degree from the School of Mathematics and Statistics, Northeast Normal University, in 2015. He is currently a Lecturer with the School of Software, Jiangxi Normal University. His research interests include artificial intelligence, computer vision, and machine learning.

    Jianzhong Wang was born in Changchun, China. He received the B.S. degree from the Computer School, Jilin University, China, in 2004, the M.S. degree from the Computer School, Northeast Normal University, China, in 2007, and the Ph.D. degree from the School of Mathematics and Statistics, Northeast Normal University, in 2010. He is currently an Associate Professor with the College of Information Science and Technology, Northeast Normal University. His research interests focus on dimensionality reduction and image processing.

    Wei Zhou received the M.S. degree from the College of Computer Science and Information Technology, Northeast Normal University, in 2015, and the Ph.D. degree from Northeastern University, Shenyang, China, in 2018. She is currently a Lecturer with the College of Computer Science, Shenyang Aerospace University. Her research interests include medical imaging processing, dimensionality reduction and feature selection.

    Yuming Fang received his Ph.D. degree from Nanyang Technological University in Singapore, M.S. degree from Beijing University of Technology in Beijing, China, and B.E. degree from Sichuan University in Chengdu, China. Currently, he is a Professor in the School of Information Technology, Jiangxi University of Finance and Economics, Nanchang, China. He serves as an Associate Editor of IEEE Access and is on the editorial board of Signal Processing: Image Communication. His research interests include visual attention modeling, visual quality assessment, image retargeting, computer vision, 3D image/video processing, etc.

    Jun Kong was born in Jilin, China. He received the B.S. and M.S. degrees from the Department of Mathematics, Northeast Normal University, China, in 1992 and 1997, respectively, and the Ph.D. degree from the College of Mathematics, Jilin University, in 2001. He is currently a Professor with the College of Information Science and Technology, Northeast Normal University. His research interests include artificial intelligence, digital image processing, pattern recognition, machine learning, biometrics, and information security.

    Yinghua Lu was born in Jilin, China. He received his B.S degree from Computer Department of Jilin Industrial University in 1984. In 1990, he received his M.S degree from Utsunomiya University in Japan. He received the Ph.D. degree from Computer School of Jilin University. Now, he is a professor in College of Humanities and Sciences, Northeast Normal University. His research interests include digital image processing and pattern recognition.

    View full text