Elsevier

Pattern Recognition

Volume 98, February 2020, 107052
Pattern Recognition

Discriminant component analysis via distance correlation maximization

https://doi.org/10.1016/j.patcog.2019.107052Get rights and content

Highlights

  • We propose a dimensionality reduction technique based on distance correlation.

  • Our method maximizes the dependency between data samples and target variable.

  • Kernel version of our method is also derived for non-linear problems.

  • Our approach has a simple and closed-form solution.

  • Our approach is computationally efficient.

Abstract

In the following study, an innovative supervised dimensionality reduction technique is proposed. dCor-based Dimensionality Reduction or dDR technique is based on distance correlation; a powerful correlation measure which is applicable to arbitrary-dimensional random variables. By projecting the samples to a lower dimensional space, dDR maximizes the correlation between explanatory and response variables. The proposed dDR algorithm can be easily implemented and it is computationally efficient. Moreover, it has a closed-form and a simple solution which makes it significantly effective in many different applications. In order to apply the proposed technique on non-linear problems, the kernel version of the dDR is also derived. Extensive analyses and empirical experiments across various visualization, classification, and regression tasks indicate that our algorithm is the method of choice; as it offers statistically superior results in comparison with other state-of-the-art approaches in the literature.

Introduction

Recently, along with the fast pace of developments in technology, science, and huge amount of data available, the need for more robust and efficient learning algorithms has never been felt more. Among these huge data, the existence of high dimensional data sets is a prevalent and inevitable issue.

In high dimensional feature space, due to the curse of dimensionality, the conventional learning algorithms do not produce satisfactory results. In order to maintain a given sample density in many approaches, the number of required training instances and complexity of the target function grows exponentially with increasing data dimension. This situation would become worse even when the dimension of data significantly exceeds the number of data points or the required training data is expensive or difficult to collect. For example, DNA micro-array data consist of thousands of gene expression features while the number of examples is relatively small.

Reducing the data dimensionality has become very popular over the years and many effective methods have been proposed in the literature [15], [20], [38], [48]. From among these techniques, linear dimension reduction methods, which are very prevalent in the literature, try to learn a low-dimensional subspace to project the high dimensional data.

Principle Component Analysis (PCA) [23], an unsupervised linear approach of dimensionality reduction, is a useful statistical technique which has been used in many applications [53]. PCA models the data XRd as approximately lying in some low dimensional subspace. By modeling this subspace, PCA preserves as much variability as possible in the original data.

On the other hand, supervised learning techniques [5], [13], [36] try to predict the l dimensional response variable YRl from a given explanatory random variable X in order to improve the prediction accuracy of classification or regression tasks.

Significant improvements in learning algorithms are the direct result of the reduction in dimensionality of the input data before training; dimensionality reduction methods are effective tools in overcoming the curse of dimensionality. The performance of supervised learning tasks can be enhanced by considering the discriminative information in response variable while projecting the data into a lower dimensional space. This can be achieved by projecting the data in the direction towards which the dependency between explanatory and response variables is maximized. It is also desirable in dimensionality reduction to preserve the structural information of the data and maximally associate the embedded data with available side information (e.g labels). However, most of these algorithms suffer from very high costs in time and memory or they solve a very complicated optimization problem [14], [41], [44].

In this paper, we propose a supervised dimensionality reduction technique called dCor-based Dimensionality Reduction (dDR). This technique finds a direction to project the data so that it is highly related to response variable Y. In other words, our technique projects the samples into a lower dimensional space while it maximizes the dependency between explanatory random variable X and the target variable Y.

The main contributions of this paper are as follows:

  • 1.

    Our proposed technique can be solved in closed-form efficiently and it does not suffer from high computational complexity or a complicated optimization problem. dDR not only improves the learning ability of learning algorithms, but it also solves the high computational overload of the state-of-the-art approaches. Our proposed method is based on distance correlation, which is more general yet powerful than the Pearson correlation coefficient.

  • 2.

    The characteristics of dDR allow us to derive the kernel version of the algorithm in order to make dDR applicable to non-linear learning problems, as well. These dDR and KdDR algorithms are reduced to a simple optimization problem, which can be solved by eigenvalue decomposition.

  • 3.

    To indicate the effectiveness of our proposed technique, well-known and state-of-the-art dimensionality reduction methods are implemented and compared with our algorithm. Comprehensive analyses and experiments, including time complexity analyses across wide varieties of synthetic, UCI [11], and high dimensional biological data sets over classification and regression problems, are conducted. Our results indicate the effectiveness and efficiency of our approach in processing high dimensional and complex non-linear data structures.

The remainder of the paper is structured as follows: Section 2 introduces the prevalent research progresses in supervised dimensionality reduction in literature. Section 3 describes the distance correlation measure (dCor) on which our method relies on. Section 4 explains our method in a more detailed fashion. Analyses of the results and experimental settings are discussed in Section 5. The concluding Section 6 addresses the conclusion of the paper and our future work.

Section snippets

Related work

Several proposed approaches in the area of supervised dimensionality reduction are presented in this section.

Fisher Discriminant Analysis (FDA) [10] is an old yet popular approach in literature. It maximizes the between-class scatter yet minimizes the within-class scatter and projects the data to a c1 dimensional space where c is the number of classes. However, FDA has difficulties when the classes have overlaps. Moreover, it fails when the means of the classes are equal.

To tackle FDA’s

dCor: distance correlation dependency measure

For the sake of clarity, it is imperative to devote this section to explain the distance correlation measure in details and indicate that our optimization problem is originally derived from this measure.

Distance correlation or dCor (R), in short, which is proposed by Székely et al. [45], is a method of testing multivariate independence between two random variables of arbitrary dimensions. It can be defined for all distributions with finite first moments. The dCor of two normal univariate random

Supervised dimensionality reduction

This section provides a detailed explanation of our proposed algorithms which are based on distance correlation. Suppose we have N, p-dimensional data samples {(xi),i=1,,N} stored in p × N matrix X. Also, assume that Y is the l × N matrix of response variables. We are interested in finding the subspace UTX in such a way that the dependency between the target variable Y and the projected data UTX is maximized. UTX is the representation of data in lower dimensional space (projected data).

We

Experimental settings

In this section, the performances of comparative techniques are assessed and compared on a number of visualization, classification, and regression tasks. A diversity of 32 synthetic, UCI, and biological data sets are considered for visualization and classification parts. Detailed information of these data sets is summarized in Table 1. The smallest UCI data set is Fertility with 100 instances and the largest one is Abalone with 4139 samples. Among biological data sets, the highest dimensional

Conclusion and future work

This paper proposes a new supervised linear dimensionality reduction technique which is based on distance correlation; a powerful correlation measure which indicates high statistical power. dDR projects the data samples to the direction towards which the dependency between explanatory and target variables is maximized. dDR can be solved in closed-form and it is very computationally efficient. Moreover, the kernelized version of the dDR is derived (KdDR) in order to extend our method to

Acknowledgments

The authors would like to express the deepest gratitude to Mr. Farhad Abdi for his constructive advice and assistance for editing this paper.

Lida Abdi received her B.Sc. degree in Computer Engineering from Shiraz Payamnoor University, Iran, in 2010, and her M.Sc. degree in Artificial Intelligence from Shiraz University, Iran, in 2013. Her research interests include machine learning, dimensionality reduction, transfer learning, and kernel learning.

References (54)

  • M. Clark

    A Comparison of Correlation Measures. Center for Social Research

    (2013)
  • J. Demmel et al.

    Fast linear algebra is stable

    Numer. Math.

    (2007)
  • J. Demšar

    Statistical comparisons of classifiers over multiple data sets

    J. Mach. Learn. Res.

    (2006)
  • O.J. Dunn

    Multiple comparisons among means

    J. Am. Stat. Assoc.

    (1961)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen.

    (1936)
  • A. Frank, A. Asuncion, UCI machine learning repository(2007)....
  • M. Friedman

    The use of ranks to avoid the assumption of normality implicit in the analysis of variance

    J. Am. Stat. Assoc.

    (1937)
  • K. Fukumizu et al.

    Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces

    J. Mach. Learn. Res.

    (2004)
  • K. Fukumizu et al.

    Gradient-based kernel dimension reduction for regression

    J. Am. Stat. Assoc.

    (2014)
  • T. Jordan et al.

    End-to-end training of deep probabilistic CCA for joint modeling of paired biomedical observations

    Third workshop on Bayesian Deep Learning (NeurIPS 2018), MontrȨal, Canada

    (2018)
  • A. Gretton et al.

    Measuring statistical dependence with hilbert-schmidt norms

    International Conference on Algorithmic Learning Theory

    (2005)
  • N. Halko et al.

    Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions

    SIAM Rev.

    (2011)
  • M. Harandi et al.

    Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods

    IEEE Trans. Pattern Anal. Mach.Intell.

    (2018)
  • H. Hotelling

    Relations between two sets of variates

    Biometrika

    (1936)
  • R.L. Iman et al.

    Approximations of the critical region of the fbietkan statistic

    Commun. Stat.-Theory Methods

    (1980)
  • I. Jolliffe

    Principal Component Analysis

    (1986)
  • J. Josse et al.

    Measuring multivariate association and beyond

    Statist. Surv.

    (2016)
  • Cited by (8)

    • Memetic micro-genetic algorithms for cancer data classification

      2023, Intelligent Systems with Applications
    • A class-driven approach to dimension embedding

      2022, Expert Systems with Applications
      Citation Excerpt :

      The dCor-based Dimensionality Reduction (dDR) technique is based on distance correlation and purposes to maximize the correlation between inputs and the outcome. There also exists the kernel version of the dDR to apply it to non-linear data sets (Abdi & Ghodsi, 2020). The Discriminative Low-Rank Projection (DLRP) method aims to improve low-rank representation (LRR) to ensure an optimal low-rank subspace and discriminative projection and to develop a robust algorithm against outliers (Lai, Bao, Kong, Wan, & Yang, 2020).

    • Binary domain adaptation with independence maximization

      2021, International Journal of Machine Learning and Cybernetics
    View all citing articles on Scopus

    Lida Abdi received her B.Sc. degree in Computer Engineering from Shiraz Payamnoor University, Iran, in 2010, and her M.Sc. degree in Artificial Intelligence from Shiraz University, Iran, in 2013. Her research interests include machine learning, dimensionality reduction, transfer learning, and kernel learning.

    Ali Ghodsi received his B.S. degree in Computer Engineering from Shiraz University, Shiraz, Iran in 1992 and Ph.D. degree in Computer Science from University of Waterloo, Waterloo, Canada, in 2005. Currently, he is a professor at the University of Waterloo. His general research interests are in the areas of machine learning, dimensionality reduction, and visualization.

    View full text