Abstract
Wavelength selection has become a critical step in the analysis for near-infrared (NIR) spectroscopy with high co-linearity and large number of spectral variables. In this study, a novel wavelength interval selection method based on split regularized regression and partial least squares (SplitReg-PLS) is developed. SplitReg-PLS is a two-step approach, which combines the advantage of the SplitReg and PLS methods. SplitReg presents interesting properties, which can split the variables into groups and pool the regularized estimation of the regression coefficients together as groups. The PLS regression is one of the most popular methods for multivariate calibration, and is performed on the selected group variables by using the SplitReg. The SplitReg-PLS method can automatically select successive strongly correlated and interpretable spectral variables related to the response, which provides a flexible framework for variable selection. The performance of the proposed procedure is evaluated by three real NIR datasets. The results indicate that SplitReg-PLS is a good wavelength interval selection strategy.
Similar content being viewed by others
References
K.A. Bakeev, Process Analytical Technology: Spectroscopic Tools and Implementation Strategies for the Chemical and Pharmaceutical Industries (Wiley, New York, 2010)
I.M. Johnstone, D.M. Titterington, Statistical challenges of high-dimensional data. Philos. Trans. A 367, 4237–4253 (2009)
P. Geladi, B. Kowalski, Partial least-squares regression: a tutorial. Anal. Chim. Acta 185, 1–17 (1986)
V. Centner, D. Massart, O.E. de Noord, S. de Jong, B. Vandeginste, C. Sterna, Elimination of uninformative variables for multivariate calibration. Anal. Chem. 68(21), 3851–3858 (1996)
H.D. Li, Y.Z. Liang, Q.S. Xu, D.S. Cao, Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. Anal. Chim. Acta 648(1), 77–84 (2009)
R. Leardi, M. Seasholtz, R. Pell, Variable selection for multivariate calibration using a genetic algorithm: prediction of additive concentrations in polymer films from Fourier transform-infrared spectral data. Anal. Chim. Acta 461(2), 189–200 (2002)
L. Nørgaard, A. Saudland, J. Wagner, J.P. Nielsen, L. Munck, S.B. Engelsen, Interval partial least-squares regression (iPLS). Appl. Spectrosc. 54(3), 413–419 (2000)
J.H. Jiang, R.J. Berry, H.W. Siesler, Y. Ozaki, Wavelength interval selection in multi-component spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. Anal. Chem. 74, 3555–3565 (2002)
R.F. Shan, W.S. Cai, X.G. Shao, Variable selection based on locally linear embedding mapping for near-infrared spectral analysis. Chemom. Intell. Lab. Syst. 131, 31–36 (2014)
N.F. Zhao, Q.S. Xu, M.L. Tang, H. Wang, Variable screening for near infrared (NIR) spectroscopy data based on ridge partial least squares regression. Comb. Chem. High Throughput Screen. 23(8), 740–756 (2020)
X. Huang, Q.S. Xu, Y.Z. Liang, PLS regression based on sure independence screening for multivariate calibration. Anal. Method 4, 2815–2821 (2012)
L.F. Zhou, H. Wang, A combined feature screening approach of random forest and filter-based methods for ultra-high dimensional data. Curr. Bioinform. (2022). https://doi.org/10.2174/1574893617666220221120618
Y.H. Yun, H.D. Li, B.C. Deng, D.S. Cao, An overview of variable selection methods in multivariate analysis of near-infrared spectra. Trends Anal. Chem. 113, 102–115 (2019)
S. Wold, E. Johansson, M. Cocchi, PLS-Partial Least Squares Projections to Latent Structures in 3D-QSAR. In: Drug design; theory methods and applications, vol. 1, ed. by H. Kubinyi (Netherlands: ESCOM Science Publishers, Leiden, 1993), pp. 523–550
T. Rajalahti, R. Arneberg, A.C. Kroksveen, M. Berle, K.M. Myhr, O.M. Kvalheim, Discriminating variable test and selectivity ratio plot: quantitative tools for interpretation and variable and biomarker selection in complex spectral or chromatographic profiles. Anal. Chem. 81(7), 2581–2590 (2009)
C.M. Andersen, R. Bro, Variable selection in regression—a tutorial. J. Chemom. 24(11–12), 728–737 (2011)
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
H. Zou, T. Hastie, Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67(5), 301–320 (2005)
G. Cannon, D.B. Steven, Using elastic net regression to perform spectrally relevant variable selection. J. Chemom. 32, 3034–3047 (2018)
A. Christidis, L. Lakshmanan, E. Smucler, R. Zamar, Split regularized regression. Technometrics 62(3), 330–338 (2020)
T. Speed, A correlation for the 21st century. Science 334, 1502–1503 (2011)
P.J. Lewi, Pattern recognition, reflections from a chemometric point of view. Chemom. Intell. Lab. Syst. 28, 23–33 (1995)
R.W. Kennard, L.A. Stone, Computer Aided Design of Experiments. Technometrics 11, 137–148 (1969)
M. Forina, G. Drava, C. Armanino, R. Boggia, S. Lanteri, R. Leardi, P. Corti, P. Conti, R. Giangiacomo, C. Galliena, R. Bigoni, I. Quartari, C. Serra, D. Ferri, O. Leoni, L. Lazzeri, Transfer of calibration function in near-infrared spectroscopy. Chemom. Intell. Lab. Syst. 27, 189–203 (1995)
D.J. Rimbaud, D.L. Massart, R. Leardi, O.E. De Noord, Genetic algorithms as a tool for wavelength selection in multivariate calibration. Anal. Chem. 67, 4295–4301 (1995)
Acknowledgements
This study is financially supported by Hunan Provincial Department of Education Foundation of China (Grant No. 20A086). The study meets with the approval of the university’s review board. We are grateful to all employees of this institute for their encouragement and support of this research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, X., Xia, L. A novel wavelength interval selection based on split regularized regression for spectroscopic data. J Math Chem 61, 877–892 (2023). https://doi.org/10.1007/s10910-022-01444-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10910-022-01444-6