ReviewRandom matrix theory in statistics: A review
Introduction
Statistics has entered into an age where an increasingly larger volume of more complex data is being generated, often through automated measuring devices, in a wide array of disciplines such as genomics, atmospheric sciences, communications, biomedical imaging, economics and many others. The representation of such data in any nominal coordinate system often leads to so-called high-dimensional data that are frequently associated with phenomena transcending the boundary of classical multivariate statistical analysis. The continued growth of these new data sources has given rise to the incorporation of different mathematical tools into the realms of statistical analysis, which include convex analysis, Riemannian geometry and combinatorics. Random matrix theory has emerged as a particularly useful framework for posing many theoretical questions associated with the analysis of high-dimensional multivariate data.
In this paper, we mainly focus on several application areas of random matrix theory (RMT) in statistics. These include problems in dimension reduction, hypothesis testing, clustering, regression analysis and covariance estimation. We also briefly describe the important role played by RMT in enabling certain theoretical analyses in wireless communications and econometrics. Different themes emerging from these problems have in turn led to further investigation of some classical RMT phenomena. Among these, the notion of universality has profound implications in the context of high-dimensional data analysis in terms of the applicability of many statistical techniques beyond the classical framework built upon the multivariate Gaussian distribution. With these perspectives, the treatment of the topics will focus on those aspects of RMT relevant to the statistical questions. Thus, the topics in RMT that receive most attention in this paper are those related to the behavior of the bulk spectrum, i.e., the empirical spectral distribution, and the behavior of the edge of the spectrum, i.e., the extreme eigenvalues, of random matrices. Also, the sample covariance matrix being a dominant object of study in most of the multivariate analyses, much of the paper is devoted to the study of its spectral behavior. For more detailed accounts of these topics, the reader may refer to Anderson et al. (2009), Bai and Silverstein (2009) and Pastur and Shcherbina (2011). More complete and self-containing treatments of a number of “core” topics of RMT not covered in this paper, including the Riemann–Hilbert approach to the asymptotics of orthogonal polynomials and random matrices, the distribution of spacings and correlation functions of eigenvalues and their connections with determinantal point processes, and the role of the eigenvalue statistics in physics, can be found in the monographs (Akemann et al., 2011, Anderson et al., 2009, Deift, 2000, Deift and Gioev, 2009, Forrester, 2010, Guionnet, 2009, Mehta, 2004, Tao, 2012), and the survey articles (Diaconis, 2003, Soshnikov, 2000). The connection between RMT and free probability theory, another significant topic not discussed here, is explored in detail in Anderson et al. (2009), Hiai and Petz (2000), Mingo and Specicher (2006), Nica and Speicher (2006) and Edelman and Rao (2005). Finally, wireless communications and finance are two areas beyond physics and statistics where tools and concepts from RMT have been successfully applied and thus we give a brief overview of these topics in Section 4. Applications of RMT in wireless communications are the focus of Coulliet and Debbah (2011) and Tulino and Verdú (2004), while for a detailed look at applications in finance one may refer to Bouchaud et al. (2003) and Bouchaud and Potters (2009). A survey of some of the statistical topics covered in this review can be found in Johnstone (2007).
We now give a brief outline of this paper. There are two different ways in which RMT has impacted modern statistical procedures. On one hand, most of the mathematical treatment of RMT have focused on matrices with high degree of independence in the entries, which one may refer to as “unstructured” random matrices. The results from the corresponding theory have been used primarily in the context of hypothesis testing where the null hypothesis corresponds to the absence of any directionality, or signal component, in the data. On the other hand, in high-dimensional statistics, we are primarily interested in problems where there are lower dimensional structures buried under random noise. An effective treatment of the latter problem often requires going beyond the realms of the classical RMT framework and into the domain of statistical regularization schemes. Keeping these perspectives in mind, we devote 2 Background and motivation, 3 Large random matrices to the motivations and theoretical developments in RMT, while focusing on the statistical applications of RMT in Section 4. Finally, in Section 5, we focus on modern statistical regularization schemes based on various forms of sparse structures for dealing with high-dimensional statistical problems. RMT does not play any direct role in this context, except possibly in the theoretical analysis of some estimation schemes, but provides guidance for potential implications of violating the structural assumptions underlying the inferential procedures.
Section snippets
Background and motivation
Random matrices play a central role in statistics in the context of analysis of multivariate data. There are numerous books on classical multivariate analysis, most notably Anderson (1984), Mardia et al. (1980), and Muirhead (1982), that describe the major problems addressed through the use of analysis of random matrices. Most of these problems are naturally formulated in terms of the eigen-decomposition of certain Hermitian or symmetric matrices. These problems can be broadly categorized into
Large random matrices
In this section, we deal with two kinds of random matrices that have been central to most of the developments in RMT – (i) the sample covariance matrix, often referred to as the Wishart matrix, and (ii) the Wigner matrix. Both being symmetric or Hermitian matrices, depending on the entries of the matrix being real- or complex-valued, there are similarities in the type of results derived about their spectra in the RMT literature, although there are interesting differences in their asymptotic
Applications
In this section we discuss applications of RMT to statistics and allied fields. The focus is more on the practical implications than on further theoretical insight. The latter may be obtained from the multitude of references cited.
Sparse PCA, CCA, LDA and covariance estimation
The lack of consistency of classical inferential procedures for dealing with problems such as PCA, CCA, LDA and covariance estimation, as outlined by results from RMT, induced a flurry of activity in the statistical community to design regularized estimation schemes that can be effectively utilized in high-dimensional settings where additional structural information on the parameters describing the statistical models is available. This approach benefited from increasingly sophisticated uses of
Future directions
There are plenty of multivariate statistical techniques which require certain modifications to be effective in dealing with moderate to high-dimensional data. Here, we briefly discuss some areas where the enhancement of RMT may be beneficial:
- •
One striking aspect of typical economic/financial problems is that the data are dependent on time, while much of the theory in this field is under the setting of i.i.d. observations. Thus, a thorough investigation of the potential for extending the current
Acknowledgements
We thank Boaz Nadler and two anonymous referees for their helpful suggestions. The research is supported by the National Science Foundation grants DMR-1035468, DMS-1106690, DMS-1209226 and DMS-1305858.
References (303)
On the asymptotic distribution of the eigenvalues of random matrices
Journal of Mathematical Analysis and Applications
(1967)- et al.
A note on the largest eigenvalue of a large dimensional sample covariance matrix
Journal of Multivariate Analysis
(1988) - et al.
Eigenvalues of large sample covariance matrices of spiked population models
Journal of Multivariate Analysis
(2006) Strong convergence of ESD for the generalized sample covariance matrices when p/n→0
Statistics and Probability Letters
(2012)- et al.
The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices
Advances in Mathematics
(2011) - et al.
On a model selection problem from high-dimensional sample covariance matrices
Journal of Multivariate Analysis
(2011) - et al.
Analysis of the limiting spectral distribution of large dimensional information-plus-noise type matrices
Journal of Multivariate Analysis
(2007) - et al.
On the empirical distribution of eigenvalues of large dimensional information-plus-noise type matrices
Journal of Multivariate Analysis
(2007) - et al.
The Oxford Handbook of Random Matrix Theory
(2011) - et al.
High-dimensional analysis of semidefinite relaxations for sparse principal components
Annals of Statistics
(2008)
An Introduction to Random Matrices
A CLT for a band matrix model
Probability Theory and Related Fields
An Introduction to Multivariate Statistical Analysis
On Wigner's semicircle law for the eigenvalues of random matrices
Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete
Poisson convergence for the largest eigenvalues of heavy tailed random matrices
Annales de l'Institut Henri Poincaré—Probabilités et Statstiques
Random matrix model for superconductors in a magnetic field
Physical Review Letters
Inferential theory for factor models for large dimensions
Econometrica
Determining the number of factors in approximate factor models
Econometrika
Determining the number of primitive shocks in factor models
Journal of Business and Economic Statistics
Convergence rate of expected spectral distributions of large random matrices. Part I. Wigner matrices
Annals of Probability
Convergence rate of expected spectral distributions of large random matrices. Part II. Sample covariance matrices
Annals of Probability
Methodologies in spectral analysis of large dimensional random matrices, a review
Statistica Sinica
On estimation of the population spectral distribution from a high-dimensional sample covariance matrix
Australian and New Zealand Journal of Statistics
Corrections to LRT on large-dimensional covariance matrix by RMT
Annals of Statistics
On the Markowitz mean–variance analysis of self-financing portfolios
Risk and Decision Analysis
Asymptotic properties of eigenmatrices of a large sample covariance matrix
Annals of Applied Probability
On asymptotics of eigenvectors of large sample covariance matrix
Annals of Statistics
A note on the convergence rate of the spectral distributions of large dimensional random matrices
Statistics and Probability Letters
Remarks on the convergence rate of the spectral distributions of Wigner matrices
Journal of Theoretical Probability
Convergence rates of the spectral distributions of large Wigner matrices
International Journal of Mathematics
Convergence rates of spectral distributions of large sample covariance matrices
SIAM Journal of Matrix Analysis and Applications
Effect of high dimension, by an example of a two sample problem
Statistica Sinica
No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices
Annals of Probability
Exact separation of eigenvalues of large dimensional sample covariance matrices
Annals of Probability
CLT for linear spectral statistics of large dimensional sample covariance matrix
Annals of Probability
On the signal-to-interference ratio of CDMA systems in wireless communications
Annals of Applied Probability
Spectral Analysis of Large Dimensional Random Matrices
No eigenvalues outside the support of the limiting spectral distribution of information-plus-noise type matrices
Random MatricesTheory and Applications
CLT for linear spectral statistics of Wigner matrices
Electronic Journal of Probability
Functional CLT for sample covariance matrices
Bernoulli
On the convergence of the spectral empirical process of Wigner matrices
Bernoulli
On sample eigenvalues in a generalized spiked population model
Journal of Multivariate Analysis
Convergence to the semicircle law
Annals of Probability
Necessary and sufficient conditions for the almost sure convergence of the largest eigenvalue of Wigner matrices
Annals of Probability
Limit of the smallest eigenvalue of large dimensional covariance matrix
Annals of Probability
Semicircle law for Hadamard products
SIAM Journal of Matrix Analysis and Applications
Large sample covariance matrices without independence structures in columns
Statistica Sinica
Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices
Annals of Probability
Cited by (168)
Market Beta is not dead: An approach from Random Matrix Theory
2023, Finance Research LettersOn the evaluation of the eigendecomposition of the Airy integral operator
2022, Applied and Computational Harmonic AnalysisCitation Excerpt :Recently, random matrix theory (RMT) has become one of the most exciting fields in probability theory, and has been applied to problems in physics [14], high-dimensional statistics [24], wireless communications [8], finance [5], etc.
Testing General Linear Hypotheses Under a High-Dimensional Multivariate Regression Model with Spiked Noise Covariance
2024, Journal of the American Statistical Association