Skip to main content
Log in

Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Simultaneously processing several large blocks of streaming data is a computationally expensive problem. Based on the incremental singular value decomposition algorithm, we propose a new procedure for calculating the factorization of the multiblock redundancy matrix \({{\textbf {M}}}\), which makes the multiblock method more fast and efficient when analyzing large streaming data and high-dimensional dense matrices. The procedure transforms a big data problem into a small one by processing small high-dimensional matrices where variables are in rows. Numerical experiments illustrate the accuracy and performance of the incremental solution for analyzing streaming multiblock redundancy data. The experiments demonstrate that the incremental algorithm may decompose a large matrix with a 75% reduction in execution time. It is more efficient to first partition the matrix \({{\textbf {M}}}\) and then decompose it with the incremental algorithm than to decompose the entire matrix \({{\textbf {M}}}\) using the standard singular value decomposition algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. \(\Vert {{\textbf {t}}}_k \Vert = 1\), \({{\textbf {t}}}_k {{\textbf {t}}}_k' = {{\textbf {w}}}_k {{\textbf {X}}}_k {{\textbf {X}}}_k' {{\textbf {w}}}_k' = {{\textbf {w}}}_k ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{1/2} ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{'1/2} {{\textbf {w}}}_k'\) \(= {{\textbf {w}}}_k ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{1/2} ({{\textbf {w}}}_k ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{1/2})' = {{\textbf {b}}}_k {{\textbf {b}}}_k' = \Vert {{\textbf {b}}}_k \Vert = 1\)

  2. \({{\textbf {A}}}_k\) is square of order q and symmetric. We can write \({{\textbf {A}}}_k = {\textbf {Y P}}_{X_k} {{\textbf {Y}}}'\) where \({{\textbf {P}}}_{X_k} = {{\textbf {X}}}_k' ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{-1} {{\textbf {X}}}_k\) is the projection operator of the subspace spanned by the columns of \({{\textbf {X}}}_k\). \({{\textbf {P}}}_{X_k}\) is symmetric and idempotent. Then, \({{\textbf {A}}}_k = {\textbf {Y P}}_{X_k} {{\textbf {P}}}_{X_k}' {{\textbf {Y}}}' = ({\textbf {Y P}}_{X_k}) ({\textbf {Y P}}_{X_k})' = {\textbf {B B}}'\). Moreover, \({{\textbf {A}}}_k\) will be a positive semidefinite matrix if \({\textbf {v A}}_k {{\textbf {v}}}' \ge 0\) for all nonzero \({{\textbf {v}}}\).

References

Download references

Acknowledgements

We would like to sincerely thank both the guest editors and anonymous reviewers for careful reading of the paper and for their helpful comments and suggestions that highly improve the article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alba Martinez-Ruiz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Comparison of elapsed times for \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\)

Appendix 1: Comparison of elapsed times for \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\)

Figure 12 reports the CPU times of computing \({{\textbf {Y}}} {{\textbf {X}}}' ({{\textbf {X}}} {{\textbf {X}}}')^{-1} {{\textbf {X}}} {{\textbf {Y}}}'\) when \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\) is calculated through QR decomposition, LU decomposition, solving the system \({{\textbf {A}}} {{\textbf {x}}} = {{\textbf {b}}}\), and the spectral decomposition and subsequent modification of the resulting eigenvalues carried out by the R-function mpower (Schafer et al. 2017). These times were obtained for random normal data generated from a normal distribution. The multiblock set up included five blocks of variables \({{\textbf {X}}}\) and one endogenous block of variables \({{\textbf {Y}}}\), each with 10,000 observations. We processed matrices with 10, 50, 100, 250, 500, and 750 variables. Then, the experiments examined multiblock configurations with 60, 300, 600, 1500, 3000, and 4500 variables, respectively.

Fig. 12
figure 12

CPU times (s) of computing \({{\textbf {Y}}} {{\textbf {X}}}' ({{\textbf {X}}} {{\textbf {X}}}')^{-1} {{\textbf {X}}} {{\textbf {Y}}}'\) when the inverse of \({{\textbf {X}}} {{\textbf {X}}}'\) is calculated with different methods

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martinez-Ruiz, A., Lauro, N.C. Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis. Comput Stat (2023). https://doi.org/10.1007/s00180-023-01418-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00180-023-01418-5

Keywords

Navigation