Abstract
NIPALS and SIMPLS algorithms are the most commonly used algorithms for partial least squares analysis. When the number of objects, N, is much larger than the number of explanatory, K, and/or response variables, M, the NIPALS algorithm can be time consuming. Even though the SIMPLS is not as time consuming as the NIPALS and can be preferred over the NIPALS, there are kernel algorithms developed especially for the cases where N is much larger than number of variables. In this study, the NIPALS, SIMPLS and some kernel algorithms have been used to built partial least squares regression model. Their performances have been compared in terms of the total CPU time spent for the calculations of latent variables, leave-one-out cross validation and bootstrap methods. According to the numerical results, one of the kernel algorithms suggested by Dayal and MacGregor (J Chemom 11:73–85, 1997) is the fastest algorithm.
Similar content being viewed by others
Abbreviations
- X :
-
N × K matrix of explanatory variables
- Y :
-
N × M matrix of response variables
- F :
-
N × M matrix of residuals
- B PLS :
-
K × M matrix of PLS regression coefficients
- T :
-
N × A matrix of PLS latent variables for X
- U :
-
N × A matrix of PLS latent variables for Y
- W :
-
K × A matrix of weights of deflated X matrix on latent variables T
- R :
-
K × A matrix of weights of original X matrix on latent variables T
- C :
-
M × A matrix of weights of Y on latent variables U
- P :
-
K × A matrix of loadings for X
- t a :
-
A column vector of T
- u a :
-
A column vector of U
- w a :
-
A column vector of W
- r a :
-
A column vector of R
- c a :
-
A column vector of C
- p a :
-
A column vector of P
- :
-
Uppercase bold variables will represent matrices and lower case bold variables will represent column vectors in the paper. The transpose of a matrix will be given with “ ′ ”. N, K, M and A are the number of objects, the number of explanatory variables, the number of response variables and the number of latent variables, respectively. The notations used in the paper are given above. It is assumed that the columns of X and Y are mean-centered and scaled prior to PLS model estimation to have mean zero and standard deviation one.
References
Boos DD (2003) Introduction to the bootstrap world. Stat Sci 18: 168–174
Dayal BS, MacGregor JF (1997) Improved PLS algorithms. J Chemom 11: 73–85
De Jong S (1993) SIMPLS: an alternative approach to partial least squares regression. Chemom Intell Lab Syst 18: 251–263
De Jong S, Ter Braak CJF (1994) Short communication: comments on the PLS kernel algorithm. J Chemom 8: 169–174
Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1: 54–77
Lindgren F, Rännar S (1998) Alternative partial least squares (PLS) algorithms. Perspectives Drug Discov Des 12/13/14: 105–113
Lindgren F, Geladi P, Wold S (1993) The kernel algorithm for PLS. J Chemom 7: 45–59
Picard RR, Cook RD (1984) Cross-validation of regression models. J Am Stat Assoc 79: 575–583
Rännar S, Lindgren F, Geladi P, Wold S (1994) A PLS kernel algorithm for data sets with many variables and fewer objects. Part 1: theory and algorithm. J Chemom 8: 111–125
Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88: 486–494
Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58: 109–130
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Alin, A. Comparison of PLS algorithms when number of objects is much larger than number of variables. Stat Papers 50, 711–720 (2009). https://doi.org/10.1007/s00362-009-0251-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-009-0251-7