Review
Random matrix theory in statistics: A review

https://doi.org/10.1016/j.jspi.2013.09.005Get rights and content

Highlights

  • Classical problems in multivariate statistical analysis and their connections to random matrices.

  • Main objects of study in the random matrix theory literature with emphasis of the objects mostly relevant in statistical analysis of high-dimensional data.

  • Applications of the theoretical results in random matrix theory in various problems in statistics, economics, wireless communications and other fields.

  • Developments of statistical regularization techniques for dealing with high-dimensional problems.

  • Potential future directions of research.

Abstract

We give an overview of random matrix theory (RMT) with the objective of highlighting the results and concepts that have a growing impact in the formulation and inference of statistical models and methodologies. This paper focuses on a number of application areas especially within the field of high-dimensional statistics and describes how the development of the theory and practice in high-dimensional statistical inference has been influenced by the corresponding developments in the field of RMT.

Introduction

Statistics has entered into an age where an increasingly larger volume of more complex data is being generated, often through automated measuring devices, in a wide array of disciplines such as genomics, atmospheric sciences, communications, biomedical imaging, economics and many others. The representation of such data in any nominal coordinate system often leads to so-called high-dimensional data that are frequently associated with phenomena transcending the boundary of classical multivariate statistical analysis. The continued growth of these new data sources has given rise to the incorporation of different mathematical tools into the realms of statistical analysis, which include convex analysis, Riemannian geometry and combinatorics. Random matrix theory has emerged as a particularly useful framework for posing many theoretical questions associated with the analysis of high-dimensional multivariate data.

In this paper, we mainly focus on several application areas of random matrix theory (RMT) in statistics. These include problems in dimension reduction, hypothesis testing, clustering, regression analysis and covariance estimation. We also briefly describe the important role played by RMT in enabling certain theoretical analyses in wireless communications and econometrics. Different themes emerging from these problems have in turn led to further investigation of some classical RMT phenomena. Among these, the notion of universality has profound implications in the context of high-dimensional data analysis in terms of the applicability of many statistical techniques beyond the classical framework built upon the multivariate Gaussian distribution. With these perspectives, the treatment of the topics will focus on those aspects of RMT relevant to the statistical questions. Thus, the topics in RMT that receive most attention in this paper are those related to the behavior of the bulk spectrum, i.e., the empirical spectral distribution, and the behavior of the edge of the spectrum, i.e., the extreme eigenvalues, of random matrices. Also, the sample covariance matrix being a dominant object of study in most of the multivariate analyses, much of the paper is devoted to the study of its spectral behavior. For more detailed accounts of these topics, the reader may refer to Anderson et al. (2009), Bai and Silverstein (2009) and Pastur and Shcherbina (2011). More complete and self-containing treatments of a number of “core” topics of RMT not covered in this paper, including the Riemann–Hilbert approach to the asymptotics of orthogonal polynomials and random matrices, the distribution of spacings and correlation functions of eigenvalues and their connections with determinantal point processes, and the role of the eigenvalue statistics in physics, can be found in the monographs (Akemann et al., 2011, Anderson et al., 2009, Deift, 2000, Deift and Gioev, 2009, Forrester, 2010, Guionnet, 2009, Mehta, 2004, Tao, 2012), and the survey articles (Diaconis, 2003, Soshnikov, 2000). The connection between RMT and free probability theory, another significant topic not discussed here, is explored in detail in Anderson et al. (2009), Hiai and Petz (2000), Mingo and Specicher (2006), Nica and Speicher (2006) and Edelman and Rao (2005). Finally, wireless communications and finance are two areas beyond physics and statistics where tools and concepts from RMT have been successfully applied and thus we give a brief overview of these topics in Section 4. Applications of RMT in wireless communications are the focus of Coulliet and Debbah (2011) and Tulino and Verdú (2004), while for a detailed look at applications in finance one may refer to Bouchaud et al. (2003) and Bouchaud and Potters (2009). A survey of some of the statistical topics covered in this review can be found in Johnstone (2007).

We now give a brief outline of this paper. There are two different ways in which RMT has impacted modern statistical procedures. On one hand, most of the mathematical treatment of RMT have focused on matrices with high degree of independence in the entries, which one may refer to as “unstructured” random matrices. The results from the corresponding theory have been used primarily in the context of hypothesis testing where the null hypothesis corresponds to the absence of any directionality, or signal component, in the data. On the other hand, in high-dimensional statistics, we are primarily interested in problems where there are lower dimensional structures buried under random noise. An effective treatment of the latter problem often requires going beyond the realms of the classical RMT framework and into the domain of statistical regularization schemes. Keeping these perspectives in mind, we devote 2 Background and motivation, 3 Large random matrices to the motivations and theoretical developments in RMT, while focusing on the statistical applications of RMT in Section 4. Finally, in Section 5, we focus on modern statistical regularization schemes based on various forms of sparse structures for dealing with high-dimensional statistical problems. RMT does not play any direct role in this context, except possibly in the theoretical analysis of some estimation schemes, but provides guidance for potential implications of violating the structural assumptions underlying the inferential procedures.

Section snippets

Background and motivation

Random matrices play a central role in statistics in the context of analysis of multivariate data. There are numerous books on classical multivariate analysis, most notably Anderson (1984), Mardia et al. (1980), and Muirhead (1982), that describe the major problems addressed through the use of analysis of random matrices. Most of these problems are naturally formulated in terms of the eigen-decomposition of certain Hermitian or symmetric matrices. These problems can be broadly categorized into

Large random matrices

In this section, we deal with two kinds of random matrices that have been central to most of the developments in RMT – (i) the sample covariance matrix, often referred to as the Wishart matrix, and (ii) the Wigner matrix. Both being symmetric or Hermitian matrices, depending on the entries of the matrix being real- or complex-valued, there are similarities in the type of results derived about their spectra in the RMT literature, although there are interesting differences in their asymptotic

Applications

In this section we discuss applications of RMT to statistics and allied fields. The focus is more on the practical implications than on further theoretical insight. The latter may be obtained from the multitude of references cited.

Sparse PCA, CCA, LDA and covariance estimation

The lack of consistency of classical inferential procedures for dealing with problems such as PCA, CCA, LDA and covariance estimation, as outlined by results from RMT, induced a flurry of activity in the statistical community to design regularized estimation schemes that can be effectively utilized in high-dimensional settings where additional structural information on the parameters describing the statistical models is available. This approach benefited from increasingly sophisticated uses of

Future directions

There are plenty of multivariate statistical techniques which require certain modifications to be effective in dealing with moderate to high-dimensional data. Here, we briefly discuss some areas where the enhancement of RMT may be beneficial:

  • One striking aspect of typical economic/financial problems is that the data are dependent on time, while much of the theory in this field is under the setting of i.i.d. observations. Thus, a thorough investigation of the potential for extending the current

Acknowledgements

We thank Boaz Nadler and two anonymous referees for their helpful suggestions. The research is supported by the National Science Foundation grants DMR-1035468, DMS-1106690, DMS-1209226 and DMS-1305858.

References (303)

  • G. Anderson et al.

    An Introduction to Random Matrices

    (2009)
  • G. Anderson et al.

    A CLT for a band matrix model

    Probability Theory and Related Fields

    (2006)
  • T.W. Anderson

    An Introduction to Multivariate Statistical Analysis

    (1984)
  • L. Arnold

    On Wigner's semicircle law for the eigenvalues of random matrices

    Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete

    (1971)
  • A. Auffinger et al.

    Poisson convergence for the largest eigenvalues of heavy tailed random matrices

    Annales de l'Institut Henri Poincaré—Probabilités et Statstiques

    (2009)
  • S.R. Bahcall

    Random matrix model for superconductors in a magnetic field

    Physical Review Letters

    (1996)
  • J. Bai

    Inferential theory for factor models for large dimensions

    Econometrica

    (2003)
  • J. Bai et al.

    Determining the number of factors in approximate factor models

    Econometrika

    (2002)
  • J. Bai et al.

    Determining the number of primitive shocks in factor models

    Journal of Business and Economic Statistics

    (2007)
  • Z.D. Bai

    Convergence rate of expected spectral distributions of large random matrices. Part I. Wigner matrices

    Annals of Probability

    (1993)
  • Z.D. Bai

    Convergence rate of expected spectral distributions of large random matrices. Part II. Sample covariance matrices

    Annals of Probability

    (1993)
  • Z.D. Bai

    Methodologies in spectral analysis of large dimensional random matrices, a review

    Statistica Sinica

    (1999)
  • Z.D. Bai et al.

    On estimation of the population spectral distribution from a high-dimensional sample covariance matrix

    Australian and New Zealand Journal of Statistics

    (2010)
  • Bai, Z.D., Hu, J., Pan, G.M., Zhou, W., 2012. Convergence of the Empirical Spectral Distribution Function of Beta...
  • Z.D. Bai et al.

    Corrections to LRT on large-dimensional covariance matrix by RMT

    Annals of Statistics

    (2009)
  • Bai, Z.D., Jiang, D., Yao, J.-F., Zheng, S., 2012. Testing Linear Hypothesis in High-Dimensional Regression. Technical...
  • Z.D. Bai et al.

    On the Markowitz mean–variance analysis of self-financing portfolios

    Risk and Decision Analysis

    (2009)
  • Z.D. Bai et al.

    Asymptotic properties of eigenmatrices of a large sample covariance matrix

    Annals of Applied Probability

    (2011)
  • Z.D. Bai et al.

    On asymptotics of eigenvectors of large sample covariance matrix

    Annals of Statistics

    (2007)
  • Z.D. Bai et al.

    A note on the convergence rate of the spectral distributions of large dimensional random matrices

    Statistics and Probability Letters

    (1997)
  • Z.D. Bai et al.

    Remarks on the convergence rate of the spectral distributions of Wigner matrices

    Journal of Theoretical Probability

    (1999)
  • Z.D. Bai et al.

    Convergence rates of the spectral distributions of large Wigner matrices

    International Journal of Mathematics

    (2002)
  • Z.D. Bai et al.

    Convergence rates of spectral distributions of large sample covariance matrices

    SIAM Journal of Matrix Analysis and Applications

    (2003)
  • Z.D. Bai et al.

    Effect of high dimension, by an example of a two sample problem

    Statistica Sinica

    (1996)
  • Z.D. Bai et al.

    No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices

    Annals of Probability

    (1998)
  • Z.D. Bai et al.

    Exact separation of eigenvalues of large dimensional sample covariance matrices

    Annals of Probability

    (1999)
  • Z.D. Bai et al.

    CLT for linear spectral statistics of large dimensional sample covariance matrix

    Annals of Probability

    (2004)
  • Z.D. Bai et al.

    On the signal-to-interference ratio of CDMA systems in wireless communications

    Annals of Applied Probability

    (2007)
  • Z.D. Bai et al.

    Spectral Analysis of Large Dimensional Random Matrices

    (2009)
  • Z.D. Bai et al.

    No eigenvalues outside the support of the limiting spectral distribution of information-plus-noise type matrices

    Random MatricesTheory and Applications

    (2012)
  • Z.D. Bai et al.

    CLT for linear spectral statistics of Wigner matrices

    Electronic Journal of Probability

    (2009)
  • Z.D. Bai et al.

    Functional CLT for sample covariance matrices

    Bernoulli

    (2010)
  • Z.D. Bai et al.

    On the convergence of the spectral empirical process of Wigner matrices

    Bernoulli

    (2005)
  • Z.D. Bai et al.

    On sample eigenvalues in a generalized spiked population model

    Journal of Multivariate Analysis

    (2011)
  • Z.D. Bai et al.

    Convergence to the semicircle law

    Annals of Probability

    (1988)
  • Z.D. Bai et al.

    Necessary and sufficient conditions for the almost sure convergence of the largest eigenvalue of Wigner matrices

    Annals of Probability

    (1988)
  • Z.D. Bai et al.

    Limit of the smallest eigenvalue of large dimensional covariance matrix

    Annals of Probability

    (1993)
  • Z.D. Bai et al.

    Semicircle law for Hadamard products

    SIAM Journal of Matrix Analysis and Applications

    (2006)
  • Z.D. Bai et al.

    Large sample covariance matrices without independence structures in columns

    Statistica Sinica

    (2008)
  • J. Baik et al.

    Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices

    Annals of Probability

    (2005)
  • Cited by (168)

    • On the evaluation of the eigendecomposition of the Airy integral operator

      2022, Applied and Computational Harmonic Analysis
      Citation Excerpt :

      Recently, random matrix theory (RMT) has become one of the most exciting fields in probability theory, and has been applied to problems in physics [14], high-dimensional statistics [24], wireless communications [8], finance [5], etc.

    View all citing articles on Scopus
    View full text