Singular value decomposition in additive, multiplicative, and logistic forms

doi:10.1016/j.patcog.2005.01.010

Pattern Recognition

Volume 38, Issue 7, July 2005, Pages 1099-1110

https://doi.org/10.1016/j.patcog.2005.01.010 Get rights and content

Abstract

Singular value decomposition (SVD) is widely used in data processing, reduction, and visualization. Applied to a positive matrix, the regular additive SVD by the first several dual vectors can yield irrelevant negative elements of the approximated matrix. We consider a multiplicative SVD modification that corresponds to minimizing the relative errors and produces always positive matrices at any approximation step. Another logistic SVD modification can be used for decomposition of the matrices of proportions, when a regular SVD can yield the elements beyond the zero-one range, while the modified SVD decomposition produces all the elements within the correct range at any step of approximation. Several additional modifications of matrix approximation are also considered.

Introduction

Singular value decomposition (SVD) was introduced by Eckart and Young [1] and has become one of the most widely used techniques of computational algebra and multivariate statistical analysis applied for data approximation, reduction and visualization. The SVD, is also known in terms of matrix spectral decomposition, is closely related to principal components and Moore–Penrose generalized matrix inverse. SVD presents a rectangular matrix via a low rank additive combination of the outer products of dual right and left eigenvectors [2], [3], [4], [5]. Sequential sums of these outer products yield the matrix approximation with a needed precision defined by the cumulative share of the eigenvalues in the matrix squared Euclidian norm. SVD is applied to various problems in pattern recognition [6], [7], [8], [9], [10], [11], [12], multidimensional scaling and cluster analysis [13], [14], [15], [16], [17], and perceptual mapping [18], [19]. It is the main tool in correspondence analysis, or dual scaling for categorical data [20], [21], [22], [23], [24], [25], [26], [27]. Numerous SVD applications are known in practical data visualization [28], [29], [30], [31], [32], [33], and in priority evaluations [34], [35], [36], [37], [38], [39].

Although the SVD is extremely useful in various applications, it can produce inadequate results for some data. In scene recognition and reconstruction problems, the positive pixel data matrices are used. Perceptual maps are often constructed by the counts, proportions, or positive share values. The correspondence analysis utilizes the second and third pairs of dual vectors for data plotting, so the matrix approximation of the third rank is implicitly used. In all these problems, if we reconstruct the original data by the first several items of the matrix spectral decomposition, we could easily obtain an approximated matrix with irrelevant negative values (for instance, of a pixel data). In the case of proportions data, the decomposition by singular vectors could yield approximation of a lower rank with the reconstructed elements of beyond the needed interval (outside of 0–100 range for a percent data).

In this work we suggest a convenient modification of SVD that produces a lower rank approximation of data with the desired properties. To obtain an always positive matrix approximation of any rank, we consider the SVD applied to the logarithm transformation of the elements of the original data matrix. This approach corresponds to the minimization of the multiplicative relative deviations of the vectors’ outer product from the original data. We obtain a multiplicative decomposition of the matrix into a product of the exponents powered with the singular values and dual vectors. In another approach, using SVD for the logistically transformed proportion data and minimizing the deviations, we obtain a lower rank approximation with all the matrix elements positive and less than one. This technique is based on minimization by the criterion of the multiplicative relative deviations from the odds of the empirical proportions. We consider also an SVD with additive components that correspond to a data matrix centering in both directions, and an SVD for data similar to regression analysis when besides independent variables there is a dependent variable. In the data transformation to the logarithms for positive matrices, or to the logarithms of the odds for proportion matrices, the element-by-element transformation is performed. In the transformation from the singular value decomposition for the transformed data back to the matrix approximation for the original data, the inverted transformation is performed on the elementwise basis as well. This procedure engages a straightforward transformation of the values of each element, not the matrix as a whole, so no computational problem (such as in taking an exponent of a matrix) occur.

This paper is organized as follows. Section 2 describes the regular SVD technique and suggests its additive, multiplicative, and logistic extensions. Section 3 considers numerical examples, and Section 4 summarizes.

Section snippets

Matrix decomposition in additive, multiplicative, and logistic forms

Let us briefly describe the regular SVD, or matrix approximation by a cumulative sum of the outer products of eigenvectors—see, for instance, [1], [2], [3], [4], [5]. Let X denote a data matrix of $m \times n$ order, with elements $x_{ij}$ of $i$ th observations ( $i = 1, \dots, m$ ) by $j$ th variables ( $j = 1, \dots, n$ ). A matrix approximation by r outer products of the vectors is $x_{ij} = λ_{1} b_{i 1} a_{j 1} + λ_{2} b_{i 2} a_{j 2} + \dots + λ_{r} b_{ir} a_{jr} + ɛ_{ij},$ where $b_{ik}$ and $a_{jk}$ are elements of the kth pair of vectors $b_{k}$ and $a_{k}$ (of $m$ th and $n$ th order, respectively), $λ_{k}$ are the

Numerical examples

The data for numerical examples are taken from a marketing research project on a pharmaceutical product evaluated by 280 medical practitioners. The product's brands are denoted as Z, ZS, YD, YS, Y, X, and XD (where X, Y, and Z are actual brands, S denotes a syrup version, and D denotes an additional ingredient). The attributes are: a—quick relief, b—safe to use, c—safe to use with other diseases, d—listed on most formularies, e—few side effects, f—one dose for all day, g—long lasting relief,

Summary

We considered the singular value decomposition technique adjusted for the obtaining of matrices with specific features at any step of approximation. It is shown that a positive matrix can be more adequately approximated in the multiplicative SVD by the product of powered spectral decomposition items obtained by the logarithm of the original elements of a matrix. In such an approximation we are guaranteed to preserve the positive entries of the original matrix in the entries of any approximating

Acknowledgements

The authors wish to thank a referee whose valuable comments and suggestions improved and clarified the paper.

References (45)

Y. Tian et al.
Do singular values contain adequate information for face recognition
Pattern Recognition
(2003)
S.I. Gass et al.
Singular value decomposition in AHP
Eur. J. Oper. Res.
(2004)
S. Lipovetsky et al.
Robust estimation of priorities in the AHP
Eur. J. Oper. Res.
(2002)
S. Lipovetsky et al.
Linear methods in multimode data analysis for decision making
Comput. Oper. Res.
(1994)
C. Eckart et al.
The approximation of one matrix by another of lower rank
Psycometrika
(1936)
G.H. Golub et al.
Matrix Computations
(1983)
G.A.F. Seber
Multivariate Observations
(1984)
R.A. Thisted
Elements of Statistical ComputingNumerical Computation
(1988)
D. Kalman
A singular value decompositionthe SVD of a matrix
Coll. Math. J.
(1996)
H.C. Andrews et al.
Outer product expansions and their uses in digital image processing
Am. Math. Mon.
(1975)

V.C. Klema

The singular value decomposition: its computation and some applications

IEEE Trans. Automat. Control

(1980)

K. Fukunaga

Introduction to Statistical Pattern Recognition

(1990)

C.M. Bishop

Neural Networks for Pattern Recognition

(1995)

B.D. Ripley

Pattern Recognition and Neural Networks

(1996)

R. Duda et al.

Pattern Classification

(2000)

H.F. Gollob

A statistical model which combines features of factor analysis and analysis of variance techniques

Psychometrika

(1968)

S.S. Schiffman et al.

Introduction to Multidimensional Scaling

(1981)

J.M. Chambers et al.

Graphical Methods for Data Analysis

(1983)

M.S. Oh et al.

Bayesian multidimensional scaling and choice of dimension

J. Am. Statist. Assoc.

(2001)

P. Drineas et al.

Clustering large graphs via the singular value decomposition

Mach. Learn.

(2004)

K.R. Gabriel

The biplot graphical display of matrices with applications to principal component analysis

Biometrika

(1971)

K.R. Gabriel et al.

Biplots in biomedical research

Statist. Med.

(1990)

Cited by (17)

Unsupervised feature selection guided by orthogonal representation of feature space
2023, Neurocomputing
Citation Excerpt :
Matrix factorization has also been used widely in dimensionality reduction problems along with sparsity regularization. SVD [18], NMF [19], and PCA [20] are typical dimensionality reduction methods; these methods are categorized as feature extraction techniques [21–24]. Here, we would like to point out two applications of feature selection methods based on matrix factorization in the real world.
Feature selection has been an outstanding strategy in eliminating redundant and inefficient features in high-dimensional data. This paper introduces a novel unsupervised feature selection based on the matrix factorization, namely Unsupervised Feature Selection Guided by Orthogonal Representation (UFGOR). The orthogonality between a pair of variables refers to a specific case of linear independence such that they are perfectly uncorrelated. Motivated by the benefits of the orthogonality concept, the proposed UFGOR method is established based on the distance between the selected feature set and an orthogonal set corresponding to the whole feature space. Moreover, this orthogonal set is generated via QR-matrix factorization over the whole features and is employed as the compact representation of data matrix. In the next step, an unsupervised feature selection method is performed through the matrix factorization of the generated orthogonal set. Additionally, a dual-correlation model is utilized in the objective function of UFGOR to simultaneously consider both the local correlation in a set of selected features and the global correlation among the samples of a data. A detailed convergence analysis in line with an effective iterative algorithm proposed for the UFGOR method is also given. Numerical experiments on several real-world datasets illustrate the superior efficiency of our approach in comparison with some state-of-the-art unsupervised feature selection methods.
Supervised feature selection via matrix factorization based on singular value decomposition
2019, Chemometrics and Intelligent Laboratory Systems
Feature selection is a main research issue in gene expression data. It tries to determine the best subset of features (genes) for a machine learning model or statistical analysis. In this paper, a novel supervised feature selection algorithm is proposed by using matrix factorization and Singular Value Decomposition (SVD) for microarray datasets. First, an unsupervised matrix factorization based existed method in the literature is used for feature selection as the main framework of our proposed method. Then SVD is incorporated into the first matrix factorization approach for increasing its efficiency. In SVD, a matrix decomposes into three matrices such that one of these factors contains an orthonormal basis for original matrix. Thus, a submatrix of the original matrix is located instead of this factor. The replaced submatrix produces the same space as original matrix, therefore without any complex measure we find a submatrix that has the least distance with data matrix. By considering linear independence as a redundancy criterion, selected features have the least redundancy between themselves. To improve prediction accuracy, the sum of Information Gain (IG) values for selected features is formulated in the objective function as the relevancy measure and makes our algorithm to be a supervised approach. Finally, the proposed algorithm is compared with eleven common feature selection methods and two state-of-the-art approaches using nine publicly available microarray datasets. In terms of accuracy, the experimental results indicate that the proposed method is comparable with the state-of-the-art methods and it has the better performance than the others.
Unsupervised feature selection by regularized matrix factorization
2018, Neurocomputing
Citation Excerpt :
Besides the sparsity regularization, matrix factorization has also been widely used in dimensionality reduction problem. The typical matrix factorization based dimensionality reduction methods include SVD [21], PCA [6] and NMF [7]. However, these methods and their related extensions [22–25] are all designed for feature extraction rather than feature selection.
Feature selection is an interesting and challenging task in data analysis process. In this paper, a novel algorithm named Regularized Matrix Factorization Feature Selection (RMFFS) is proposed for unsupervised feature selection. Compared with other matrix factorization based feature selection methods, a main advantage of our algorithm is that it takes the correlation among features into consideration. Through introducing an inner product regularization into our algorithm, the features selected by RMFFS would not only well represent the original high-dimensional data, but also contain low redundancy. Moreover, a simple yet efficient iteratively updating algorithm is also developed to solve the proposed RMFFS. Extensive experimental results on nine real world databases demonstrate that our proposed method can achieve better performance than some state-of-the-art unsupervised feature selection methods.
Subspace learning for unsupervised feature selection via matrix factorization
2015, Pattern Recognition
Citation Excerpt :
In the recent years, matrix factorization techniques [3,4] for machine learning and data mining have been attracting more and more attention. As its special cases, nonnegative matrix factorization (NMF) [5], principal component analysis (PCA) [6,7] and singular value decomposition (SVD) [8,9] come with many efficient algorithms for feature extraction. However, there are a few works on matrix factorization methods considered for feature selection.
Dimensionality reduction is an important and challenging task in machine learning and data mining. Feature selection and feature extraction are two commonly used techniques for decreasing dimensionality of the data and increasing efficiency of learning algorithms. Specifically, feature selection realized in the absence of class labels, namely unsupervised feature selection, is challenging and interesting. In this paper, we propose a new unsupervised feature selection criterion developed from the viewpoint of subspace learning, which is treated as a matrix factorization problem. The advantages of this work are four-fold. First, dwelling on the technique of matrix factorization, a unified framework is established for feature selection, feature extraction and clustering. Second, an iterative update algorithm is provided via matrix factorization, which is an efficient technique to deal with high-dimensional data. Third, an effective method for feature selection with numeric data is put forward, instead of drawing support from the discretization process. Fourth, this new criterion provides a sound foundation for embedding kernel tricks into feature selection. With this regard, an algorithm based on kernel methods is also proposed. The algorithms are compared with four state-of-the-art feature selection methods using six publicly available datasets. Experimental results demonstrate that in terms of clustering results, the proposed two algorithms come with better performance than the others for almost all datasets we experimented with here.
PCA and SVD with nonnegative loadings
2009, Pattern Recognition
Citation Excerpt :
The modern SVD techniques include tensor decomposition for multidimensional matrices [11,12,23], clustering within decomposition and nonnegative matrix factorization [23,32–35]. For positive data (for instance, characteristics measured by positive scales, or counts, percents, pixels, etc.) it is possible to use the multiplicative, or logistic SVD, and to obtain any lower rank matrix approximation with all the matrix elements positive, and also less than one [36]. In applied problems of dimensionality reduction and extraction of the main features, it is convenient to compose in each aggregate only the variables which have the same direction of their influence on the aggregate.
Principal component analysis (PCA) and singular value decomposition (SVD) are widely used in multivariate statistical analysis for data reduction. The work considers exponential, logit, and multinomial parameterization of the eigenvectors’ elements that always yields nonnegative loadings of shares for variable aggregation. In contrast to regular PCA and SVD, matrix decomposition by the positive shares shows explicitly which variables and with which percent are composed into each group, so what is each variable contribution to data approximation. The least squares objective of matrix fit is reduced to Rayleigh quotient for variational description of the eigenvalues. Eigenvectors with the nonlinear parameterization can be found in Newton–Raphson optimizing procedure. Numerical examples compare the classical and nonnegative loadings results, with interpretation by the Perron–Frobenius theory for each subset of variables identified by sparse loading vectors.
Dual pls analysis
2012, International Journal of Information Technology and Decision Making

View all citing articles on Scopus

View full text