Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis

Martinez-Ruiz, Alba; Lauro, Natale Carlo

doi:10.1007/s00180-023-01418-5

Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis

Original Paper
Published: 24 October 2023

(2023)
Cite this article

Computational Statistics Aims and scope Submit manuscript

130 Accesses
Explore all metrics

Abstract

Simultaneously processing several large blocks of streaming data is a computationally expensive problem. Based on the incremental singular value decomposition algorithm, we propose a new procedure for calculating the factorization of the multiblock redundancy matrix \({{\textbf {M}}}\), which makes the multiblock method more fast and efficient when analyzing large streaming data and high-dimensional dense matrices. The procedure transforms a big data problem into a small one by processing small high-dimensional matrices where variables are in rows. Numerical experiments illustrate the accuracy and performance of the incremental solution for analyzing streaming multiblock redundancy data. The experiments demonstrate that the incremental algorithm may decompose a large matrix with a 75% reduction in execution time. It is more efficient to first partition the matrix \({{\textbf {M}}}\) and then decompose it with the incremental algorithm than to decompose the entire matrix \({{\textbf {M}}}\) using the standard singular value decomposition algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pass-Efficient Randomized SVD with Boosted Accuracy

Big Data Matrix Singular Value Decomposition Based on Low-Rank Tensor Train Decomposition

Incremental algorithms for truncated higher-order singular value decompositions

Article 08 January 2024

Notes

\(\Vert {{\textbf {t}}}_k \Vert = 1\), \({{\textbf {t}}}_k {{\textbf {t}}}_k' = {{\textbf {w}}}_k {{\textbf {X}}}_k {{\textbf {X}}}_k' {{\textbf {w}}}_k' = {{\textbf {w}}}_k ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{1/2} ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{'1/2} {{\textbf {w}}}_k'\) \(= {{\textbf {w}}}_k ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{1/2} ({{\textbf {w}}}_k ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{1/2})' = {{\textbf {b}}}_k {{\textbf {b}}}_k' = \Vert {{\textbf {b}}}_k \Vert = 1\)
\({{\textbf {A}}}_k\) is square of order q and symmetric. We can write \({{\textbf {A}}}_k = {\textbf {Y P}}_{X_k} {{\textbf {Y}}}'\) where \({{\textbf {P}}}_{X_k} = {{\textbf {X}}}_k' ({{\textbf {X}}}_k {{\textbf {X}}}_k')^{-1} {{\textbf {X}}}_k\) is the projection operator of the subspace spanned by the columns of \({{\textbf {X}}}_k\). \({{\textbf {P}}}_{X_k}\) is symmetric and idempotent. Then, \({{\textbf {A}}}_k = {\textbf {Y P}}_{X_k} {{\textbf {P}}}_{X_k}' {{\textbf {Y}}}' = ({\textbf {Y P}}_{X_k}) ({\textbf {Y P}}_{X_k})' = {\textbf {B B}}'\). Moreover, \({{\textbf {A}}}_k\) will be a positive semidefinite matrix if \({\textbf {v A}}_k {{\textbf {v}}}' \ge 0\) for all nonzero \({{\textbf {v}}}\).

References

Baker CG, Gallivan KA, Van Dooren P (2012) Low-rank incremental methods for computing dominant singular subspaces. Linear Algebra Appl 436(8):2866–2888. https://doi.org/10.1016/j.laa.2011.07.018
Article MathSciNet MATH Google Scholar
Bougeard S, Hanafi M, Qannari EM (2007) ACPVI multibloc application en épidémiologie animale. J Soc Fr Stat 148(4):77–94
MathSciNet MATH Google Scholar
Bougeard S, Qannari EM, Lupo C, Hanafi M (2011a) From multiblock partial least squares to multiblock redundancy analysis: a continuum approach. Informatica 22(1):11–26. https://doi.org/10.15388/Informatica.2011.311
Article MathSciNet MATH Google Scholar
Bougeard S, Qannari EM, Rose N (2011b) Multiblock redundancy analysis: interpretation tools and application in epidemiology. J Chemom 25:467–475. https://doi.org/10.1002/cem.1392
Article Google Scholar
Cardot H, Degras D (2018) Online principal component analysis in high dimension: which algorithm to choose? Int Stat Rev 86:29–50. https://doi.org/10.1111/insr.12220
Article MathSciNet MATH Google Scholar
Carroll JD (1968) Generalization of canonical correlation analysis to three or more sets of variables. In: Proceedings of the 76th annual convention APA, pp 227–228
Chan TF (1982) An improved algorithm for computing the singular value decomposition. ACM Trans Math Softw 8(1):72–83. https://doi.org/10.1145/355984.355991
Article MathSciNet MATH Google Scholar
D’Ambra L, Lauro C (1984) Principal components analysis onto reference subspaces. Rapporti di Ricerca NL/84 n.1, pp 1-22, Centre International de Mathematiques Pures et Appliquees
D’Ambra L, Lauro C (1992) Non symmetrical exploratory data analysis. Stat Appl 4:511–529
Google Scholar
de Leeuw J, Young FW, Takane Y (1976) Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika 41(4):471–503. https://doi.org/10.1007/BF02296971
Article MATH Google Scholar
Degras D, Cardot H (2016) onlinePCA: online principal component analysis. R package version 1.3.1. https://cran.r-project.org/package=onlinePCA
D’Enza AI, Markos A (2015) Low-dimensional tracking of association structures in categorical data. Stat Comput 25:1009–1022. https://doi.org/10.1007/s11222-014-9470-4
Article MathSciNet MATH Google Scholar
D’Enza AI, Markos A, Buttarazzi D (2018) The idm package: incremental decomposition methods in R. J Stat Softw 86 Code Snippet 4. https://doi.org/10.18637/jss.v086.c04
Dongarra JJ, Demmel JW, Ostrouchov S (1992) LAPACK: a linear algebra library for high-performance computers. In: Dodge Y, Whittaker J (eds) Computational statistics. Springer, Heidelberg. https://doi.org/10.1007/978-3-662-26811-7_3
Ge Z (2017) Review on data-driven modeling and monitoring for plant-wide industrial processes. Chemom Intell Lab 171:16–25. https://doi.org/10.1016/j.chemolab.2017.09.021
Article Google Scholar
Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14:403–420. https://doi.org/10.1007/BF02163027
Article MathSciNet MATH Google Scholar
Golub GH, Van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore. https://doi.org/10.1137/1028073
Book MATH Google Scholar
Hall P, Marshall D, Martin R (2002) Adding and subtracting eigenspaces with eigenvalue decomposition and singular value decomposition. Image Vis Comput 20:1009–1016. https://doi.org/10.1016/S0262-8856(02)00114-2
Article Google Scholar
Horst P (1961) Relations among m sets of variables. Psychometrika 26(2):129–149. https://doi.org/10.1007/BF02289710
Article MathSciNet MATH Google Scholar
Hotelling H (1936) Relations between two sets of variables. Biometrika 28(3/4):321–377. https://doi.org/10.1093/biomet/28.3-4.321
Article MATH Google Scholar
Izenman AJ (1975) Reduced-rank regression for the multivariate linear model. J Multivar Anal 5:248–264. https://doi.org/10.1016/0047-259X(75)90042-1
Article MathSciNet MATH Google Scholar
Johansson JK (1981) An extension of Wollenberg’s redundancy analysis. Psychometrika 46(1):93–103. https://doi.org/10.1007/BF02293921
Article MathSciNet Google Scholar
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis. Pearson Prentince Hall, Upper Saddle River
MATH Google Scholar
Kettenring JR (1971) Canonical analysis of several set of variables. Biometrika 58(3):433–451. https://doi.org/10.1093/biomet/58.3.433
Article MathSciNet MATH Google Scholar
Legendre P, Anderson MJ (1999) Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol Monogr 69(1):1–24. https://doi.org/10.1890/0012-9615(1999)069[0001:DBRATM]2.0.CO;2
Article Google Scholar
Legendre P, Oksanenn J, ter Braak CJF (2011) Testing the significance of canonical axes in redundancy analysis. Methods Ecol Evol 2:269–277. https://doi.org/10.1111/j.2041-210X.2010.00078.x
Article Google Scholar
Levy A, Lindenbaum M (2000) Sequential Karhunen–Loeve basis extraction and its applications to images. IEEE Trans Image Process 9(8):1371–1374. https://doi.org/10.1109/83.855432
Article MATH Google Scholar
Markos A, D’Enza AI (2016) Incremental generalized canonical correlation analysis. In: Wilhelm A, Kestler H (eds) Analysis of large and complex data, studies in classification, data analysis, and knowledge organization. Springer, Cham, pp 185–194. https://doi.org/10.1007/978-3-319-25226-1_16
Chapter Google Scholar
Martinez-Ruiz A, Montañola-Sales C (2019) Big data in multi-block data analysis: an approach to parallelizing partial least squares mode B algorithm. Heliyon 5(4):e01451. https://doi.org/10.1016/j.heliyon.2019.e01451
Article Google Scholar
McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82(1):290–297. https://doi.org/10.1890/0012-9658(2001)082[0290:FMMTCD]2.0.CO;2
Article Google Scholar
Obadia J (1978) L’analyse en composantes explicatives. Rev Stat Appl 26(4):5–28
MathSciNet Google Scholar
Oja E, Karhunen J (1985) On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. J Math Anal Appl 106:69–84. https://doi.org/10.1016/0022-247X(85)90131-3
Article MathSciNet MATH Google Scholar
Qin SJ (2003) Statistical process monitoring: basics and beyond. J Chemom 17(8–9):480–502. https://doi.org/10.1002/cem.800
Article Google Scholar
R Core Team (2022) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ramos JA, Verriest E (1984) A unifying tool for comparing stochastic realization algorithms and model reduction techniques. In: 1984 American control conference, San Diego, CA, USA, pp 150–155. https://doi.org/10.23919/ACC.1984.4788368
Rao CR (1964) The use and interpretation of principal component analysis in applied research. Sankhya Ser A 26(4):329–358
MathSciNet MATH Google Scholar
Robert P, Escoufier Y (1976) A unifying tool for linear multivariate statistical methods: the RV-coefficient. J R Stat Soc C Appl 25(3):257–265. https://doi.org/10.2307/2347233
Article MathSciNet Google Scholar
Ross DA, Lim J, Lin RS, Yang MH (2008) Incremental learning for robust visual tracking. Int J Comput Vis 77:125–141. https://doi.org/10.1007/s11263-007-0075-7
Article Google Scholar
Schafer J, Opgen-Rhein R, Zuber V, Ahdesmaki M, Duarte-Silva AP, Strimmer K (2017) corpcor: efficient estimation of covariance and (partial) correlation. R package version 1.6.9. https://cran.r-project.org/web/packages/corpcor/index.html
Smilde AK, Naes T, Liland KH (2022) Multiblock data fusion in statistics and machine learning. Applications in the natural and life sciences. Wiley, Hoboken. https://doi.org/10.1002/9781119600978
Book Google Scholar
Smith B, Boyle J, Dongarra J, Garbow B, Ikebe Y, Klema V, Moler C (1976) Matrix eigensystem routines, EISPACK guide. Lecture notes in computer science, vol 6. Springer, Berlin. https://doi.org/10.1007/3-540-07546-1
Stewart GW (1993) On the early history of the singular value decomposition. SIAM Rev 35(4):551–566. https://doi.org/10.1137/1035134
Article MathSciNet MATH Google Scholar
Stewart D, Love W (1968) A general canonical correlation index. Psychol Bull 70(3):160–163. https://doi.org/10.1037/h0026143
Article Google Scholar
Takane Y, Hwang H (2005) An extended redundancy analysis and its applications to two practical examples. Comput Stat Data Anal 49(3):785–808. https://doi.org/10.1016/j.csda.2004.06.004
Article MathSciNet MATH Google Scholar
Tenenhaus M (1998) La régression PLS: Théorie et pratique. Technip, Paris
MATH Google Scholar
Van den Wollenberg AL (1977) Redudancy analysis an alternative for canonical correlation analysis. Psychometrika 42(2):207–219. https://doi.org/10.1007/BF02294050
Article MATH Google Scholar
Wangen LE, Kowalski BR (1989) A multiblock partial least squares algorithm for investigating complex chemical systems. J Chemom 3(1):3–20. https://doi.org/10.1002/cem.1180030104
Article Google Scholar
Weng J, Zhang Y, Hwang WS (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal 25(8):1034–1040. https://doi.org/10.1109/TPAMI.2003.1217609
Article Google Scholar
Young FW (1972) A model for polynomial conjoint analysis algorithms. In: Shepard RN, Romney AK, Nerlove S (eds) Multidimensional scaling: theory and applications in the behavior-sciences. Academic Press, New York
Google Scholar

Download references

Acknowledgements

We would like to sincerely thank both the guest editors and anonymous reviewers for careful reading of the paper and for their helpful comments and suggestions that highly improve the article.

Author information

Authors and Affiliations

Universidad Diego Portales, Santiago, Chile
Alba Martinez-Ruiz
Università degli Studi di Napoli Federico II, Naples, Italy
Natale Carlo Lauro

Authors

Alba Martinez-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Natale Carlo Lauro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alba Martinez-Ruiz.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1: Comparison of elapsed times for \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\)

Figure 12 reports the CPU times of computing \({{\textbf {Y}}} {{\textbf {X}}}' ({{\textbf {X}}} {{\textbf {X}}}')^{-1} {{\textbf {X}}} {{\textbf {Y}}}'\) when \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\) is calculated through QR decomposition, LU decomposition, solving the system \({{\textbf {A}}} {{\textbf {x}}} = {{\textbf {b}}}\), and the spectral decomposition and subsequent modification of the resulting eigenvalues carried out by the R-function mpower (Schafer et al. 2017). These times were obtained for random normal data generated from a normal distribution. The multiblock set up included five blocks of variables \({{\textbf {X}}}\) and one endogenous block of variables \({{\textbf {Y}}}\), each with 10,000 observations. We processed matrices with 10, 50, 100, 250, 500, and 750 variables. Then, the experiments examined multiblock configurations with 60, 300, 600, 1500, 3000, and 4500 variables, respectively.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Martinez-Ruiz, A., Lauro, N.C. Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis. Comput Stat (2023). https://doi.org/10.1007/s00180-023-01418-5

Download citation

Received: 06 August 2021
Accepted: 15 September 2023
Published: 24 October 2023
DOI: https://doi.org/10.1007/s00180-023-01418-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis

Abstract

Access this article

Similar content being viewed by others

Pass-Efficient Randomized SVD with Boosted Accuracy

Big Data Matrix Singular Value Decomposition Based on Low-Rank Tensor Train Decomposition

Incremental algorithms for truncated higher-order singular value decompositions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1: Comparison of elapsed times for \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incremental singular value decomposition for some numerical aspects of multiblock redundancy analysis

Abstract

Access this article

Similar content being viewed by others

Pass-Efficient Randomized SVD with Boosted Accuracy

Big Data Matrix Singular Value Decomposition Based on Low-Rank Tensor Train Decomposition

Incremental algorithms for truncated higher-order singular value decompositions

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix 1: Comparison of elapsed times for \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\)

Appendix 1: Comparison of elapsed times for \(({{\textbf {X}}} {{\textbf {X}}}')^{-1}\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation