Discriminant component analysis via distance correlation maximization
Introduction
Recently, along with the fast pace of developments in technology, science, and huge amount of data available, the need for more robust and efficient learning algorithms has never been felt more. Among these huge data, the existence of high dimensional data sets is a prevalent and inevitable issue.
In high dimensional feature space, due to the curse of dimensionality, the conventional learning algorithms do not produce satisfactory results. In order to maintain a given sample density in many approaches, the number of required training instances and complexity of the target function grows exponentially with increasing data dimension. This situation would become worse even when the dimension of data significantly exceeds the number of data points or the required training data is expensive or difficult to collect. For example, DNA micro-array data consist of thousands of gene expression features while the number of examples is relatively small.
Reducing the data dimensionality has become very popular over the years and many effective methods have been proposed in the literature [15], [20], [38], [48]. From among these techniques, linear dimension reduction methods, which are very prevalent in the literature, try to learn a low-dimensional subspace to project the high dimensional data.
Principle Component Analysis (PCA) [23], an unsupervised linear approach of dimensionality reduction, is a useful statistical technique which has been used in many applications [53]. PCA models the data as approximately lying in some low dimensional subspace. By modeling this subspace, PCA preserves as much variability as possible in the original data.
On the other hand, supervised learning techniques [5], [13], [36] try to predict the l dimensional response variable from a given explanatory random variable X in order to improve the prediction accuracy of classification or regression tasks.
Significant improvements in learning algorithms are the direct result of the reduction in dimensionality of the input data before training; dimensionality reduction methods are effective tools in overcoming the curse of dimensionality. The performance of supervised learning tasks can be enhanced by considering the discriminative information in response variable while projecting the data into a lower dimensional space. This can be achieved by projecting the data in the direction towards which the dependency between explanatory and response variables is maximized. It is also desirable in dimensionality reduction to preserve the structural information of the data and maximally associate the embedded data with available side information (e.g labels). However, most of these algorithms suffer from very high costs in time and memory or they solve a very complicated optimization problem [14], [41], [44].
In this paper, we propose a supervised dimensionality reduction technique called dCor-based Dimensionality Reduction (dDR). This technique finds a direction to project the data so that it is highly related to response variable Y. In other words, our technique projects the samples into a lower dimensional space while it maximizes the dependency between explanatory random variable X and the target variable Y.
The main contributions of this paper are as follows:
- 1.
Our proposed technique can be solved in closed-form efficiently and it does not suffer from high computational complexity or a complicated optimization problem. dDR not only improves the learning ability of learning algorithms, but it also solves the high computational overload of the state-of-the-art approaches. Our proposed method is based on distance correlation, which is more general yet powerful than the Pearson correlation coefficient.
- 2.
The characteristics of dDR allow us to derive the kernel version of the algorithm in order to make dDR applicable to non-linear learning problems, as well. These dDR and KdDR algorithms are reduced to a simple optimization problem, which can be solved by eigenvalue decomposition.
- 3.
To indicate the effectiveness of our proposed technique, well-known and state-of-the-art dimensionality reduction methods are implemented and compared with our algorithm. Comprehensive analyses and experiments, including time complexity analyses across wide varieties of synthetic, UCI [11], and high dimensional biological data sets over classification and regression problems, are conducted. Our results indicate the effectiveness and efficiency of our approach in processing high dimensional and complex non-linear data structures.
The remainder of the paper is structured as follows: Section 2 introduces the prevalent research progresses in supervised dimensionality reduction in literature. Section 3 describes the distance correlation measure (dCor) on which our method relies on. Section 4 explains our method in a more detailed fashion. Analyses of the results and experimental settings are discussed in Section 5. The concluding Section 6 addresses the conclusion of the paper and our future work.
Section snippets
Related work
Several proposed approaches in the area of supervised dimensionality reduction are presented in this section.
Fisher Discriminant Analysis (FDA) [10] is an old yet popular approach in literature. It maximizes the between-class scatter yet minimizes the within-class scatter and projects the data to a dimensional space where c is the number of classes. However, FDA has difficulties when the classes have overlaps. Moreover, it fails when the means of the classes are equal.
To tackle FDA’s
dCor: distance correlation dependency measure
For the sake of clarity, it is imperative to devote this section to explain the distance correlation measure in details and indicate that our optimization problem is originally derived from this measure.
Distance correlation or dCor (), in short, which is proposed by Székely et al. [45], is a method of testing multivariate independence between two random variables of arbitrary dimensions. It can be defined for all distributions with finite first moments. The dCor of two normal univariate random
Supervised dimensionality reduction
This section provides a detailed explanation of our proposed algorithms which are based on distance correlation. Suppose we have N, p-dimensional data samples stored in p × N matrix X. Also, assume that Y is the l × N matrix of response variables. We are interested in finding the subspace UTX in such a way that the dependency between the target variable Y and the projected data UTX is maximized. UTX is the representation of data in lower dimensional space (projected data).
We
Experimental settings
In this section, the performances of comparative techniques are assessed and compared on a number of visualization, classification, and regression tasks. A diversity of 32 synthetic, UCI, and biological data sets are considered for visualization and classification parts. Detailed information of these data sets is summarized in Table 1. The smallest UCI data set is Fertility with 100 instances and the largest one is Abalone with 4139 samples. Among biological data sets, the highest dimensional
Conclusion and future work
This paper proposes a new supervised linear dimensionality reduction technique which is based on distance correlation; a powerful correlation measure which indicates high statistical power. dDR projects the data samples to the direction towards which the dependency between explanatory and target variables is maximized. dDR can be solved in closed-form and it is very computationally efficient. Moreover, the kernelized version of the dDR is derived (KdDR) in order to extend our method to
Acknowledgments
The authors would like to express the deepest gratitude to Mr. Farhad Abdi for his constructive advice and assistance for editing this paper.
Lida Abdi received her B.Sc. degree in Computer Engineering from Shiraz Payamnoor University, Iran, in 2010, and her M.Sc. degree in Artificial Intelligence from Shiraz University, Iran, in 2013. Her research interests include machine learning, dimensionality reduction, transfer learning, and kernel learning.
References (54)
- et al.
Supervised principal component analysis: visualization, classification and regression on subspaces and submanifolds
Pattern Recognit.
(2011) - et al.
A nonlinear dimensionality reduction framework using smooth geodesics
Pattern Recognit.
(2019) Properties of euclidean and non-euclidean distance matrices
Linear Algebra Appl.
(1985)- et al.
Nonlinear supervised dimensionality reduction via smooth regular embeddings
Pattern Recognit.
(2019) - et al.
Face sketch aging via aging oriented principal component analysis
Pattern Recognit. Lett.
(2018) - et al.
Regularized discriminant entropy analysis
Pattern Recognit.
(2014) Local Representation Theory: Modular Representations as an Introduction to the Local Representation Theory of Finite Groups
(1993)- et al.
A robust and efficient parallel SVD solver based on restarted Lanczos bidiagonalization
Electron. Trans. Numer. Anal.
(2008) - et al.
Deep canonical correlation analysis
International Conference on Machine Learning
(2013) - et al.
Prediction by supervised principal components
J. Am. Stat. Assoc.
(2012)
A Comparison of Correlation Measures. Center for Social Research
Fast linear algebra is stable
Numer. Math.
Statistical comparisons of classifiers over multiple data sets
J. Mach. Learn. Res.
Multiple comparisons among means
J. Am. Stat. Assoc.
The use of multiple measurements in taxonomic problems
Ann. Eugen.
The use of ranks to avoid the assumption of normality implicit in the analysis of variance
J. Am. Stat. Assoc.
Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces
J. Mach. Learn. Res.
Gradient-based kernel dimension reduction for regression
J. Am. Stat. Assoc.
End-to-end training of deep probabilistic CCA for joint modeling of paired biomedical observations
Third workshop on Bayesian Deep Learning (NeurIPS 2018), MontrȨal, Canada
Measuring statistical dependence with hilbert-schmidt norms
International Conference on Algorithmic Learning Theory
Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions
SIAM Rev.
Dimensionality reduction on SPD manifolds: the emergence of geometry-aware methods
IEEE Trans. Pattern Anal. Mach.Intell.
Relations between two sets of variates
Biometrika
Approximations of the critical region of the fbietkan statistic
Commun. Stat.-Theory Methods
Principal Component Analysis
Measuring multivariate association and beyond
Statist. Surv.
Cited by (8)
L1-norm discriminant analysis via Bhattacharyya error bounds under Laplace distributions
2023, Pattern RecognitionMemetic micro-genetic algorithms for cancer data classification
2023, Intelligent Systems with ApplicationsA class-driven approach to dimension embedding
2022, Expert Systems with ApplicationsCitation Excerpt :The dCor-based Dimensionality Reduction (dDR) technique is based on distance correlation and purposes to maximize the correlation between inputs and the outcome. There also exists the kernel version of the dDR to apply it to non-linear data sets (Abdi & Ghodsi, 2020). The Discriminative Low-Rank Projection (DLRP) method aims to improve low-rank representation (LRR) to ensure an optimal low-rank subspace and discriminative projection and to develop a robust algorithm against outliers (Lai, Bao, Kong, Wan, & Yang, 2020).
Binary domain adaptation with independence maximization
2021, International Journal of Machine Learning and Cybernetics
Lida Abdi received her B.Sc. degree in Computer Engineering from Shiraz Payamnoor University, Iran, in 2010, and her M.Sc. degree in Artificial Intelligence from Shiraz University, Iran, in 2013. Her research interests include machine learning, dimensionality reduction, transfer learning, and kernel learning.
Ali Ghodsi received his B.S. degree in Computer Engineering from Shiraz University, Shiraz, Iran in 1992 and Ph.D. degree in Computer Science from University of Waterloo, Waterloo, Canada, in 2005. Currently, he is a professor at the University of Waterloo. His general research interests are in the areas of machine learning, dimensionality reduction, and visualization.