Abstract
High dimensional linear regression problems are often fitted using Lasso approaches. Although the Lasso objective function is convex, it is not differentiable everywhere, making the use of gradient descent methods for minimization not straightforward. To avoid this technical issue, we apply Nesterov smoothing to the original (unsmoothed) Lasso objective function. We introduce a closed-form smoothed Lasso which preserves the convexity of the Lasso function, is uniformly close to the unsmoothed Lasso, and allows us to obtain closed-form derivatives everywhere for efficient and fast minimization via gradient descent. Our simulation studies are focused on polygenic risk scores using genetic data from a genome-wide association study (GWAS) for chronic obstructive pulmonary disease (COPD). We compare accuracy and runtime of our approach to the current gold standard in the literature, the FISTA algorithm. Our results suggest that the proposed methodology provides estimates with equal or higher accuracy than the FISTA algorithm while having the same asymptotic runtime scaling. The proposed methodology is implemented in the R-package smoothedLasso, available on the Comprehensive R Archive Network (CRAN).
Similar content being viewed by others
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2(1), 183–202 (2009)
Chi, E., Goldstein, T., Studer, C., Baraniuk, R.: fasta: fast adaptive shrinkage/thresholding algorithm. R-package version 1 (2018)
Daubechies, I., Defrise, M., Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Hahn, G., Banerjee, M., Sen, B.: Parameter estimation and inference in a continuous piecewise linear regression model (2017). http://www.cantab.net/users/ghahn/preprints/PhaseRegMultiDim.pdf. Accessed 21 Mar 2017
Hahn, G., Lutz, S.M., Laha, N., Lange, C.: smoothedLasso: smoothed LASSO regression via Nesterov smoothing. R-package version 1.3 (2020). https://cran.r-project.org/package=smoothedLasso. Accessed 21 Mar 2017
Hastie, T., Efron, B.: lars: least angle regression, lasso and forward stagewise. R-package version 1.2 (2013)
Khera, A.V., Chaffin, M., Aragam, K.G., Haas, M.E., Roselli, C., Choi, S.H., Natarajan, P., Lander, E.S., Lubitz, S.A., Ellinor, P.T., Kathiresan, S.: Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018)
Mak, T., Porsch, R., Choi, S., Zhou, X., Sham, P.: Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41(6), 469–480 (2016)
Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of \(\mathbb{R}^n\). J. Optim. Theory App. 50(1), 195–200 (1986)
Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)
Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. Ser. A 103, 127–152 (2005)
NHLBI TOPMed: Boston Early-Onset COPD Study in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) Program (2018). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000946.v3.p1. Accessed 18 Oct 2016
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Stat Comp, Vienna, Austria (2014). http://www.R-project.org/. Accessed 2 Sept 2019
Regan, E., Hokanson, J., Murphy, J., Make, B., Lynch, D., Beaty, T., Curran-Everett, D., Silverman, E., Crapo, J.: Genetic epidemiology of copd (copdgene) study design 2. COPD 7, 32–43 (2010)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 58(1), 267–288 (1996)
Tibshirani, R.: Model selection and validation 1: cross-validation (2013). https://www.stat.cmu.edu/~ryantibs/datamining/lectures/18-val1.pdf. Accessed 2 Sept 2019
Wu, T., Chen, Y., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by Lasso penalized logistic regression. Bioinformatics 25(6), 714–721 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Selection of the Lasso regularization parameter via cross validation
We aim to select the Lasso regularization parameter \(\uplambda \) using cross validation. To this end, for the simulation scenario described in Sect. 3.1 (in particular, for the chosen noise level of \(\sigma =0.5\) and the sparsity level of \(20\%\), as well as \(n=1000\) and \(p=100\)), we perform 10-fold cross validation as described in Tibshirani (2013).
To be precise, we first fix a grid of admissible values of \(\uplambda \) from which we would like to choose the regularization parameter (here, \(\uplambda \in \{0,0.05,0.1,0.15,\ldots ,,1\}\)). We then randomly divide the n data points into \(K=10\) disjoint sets (folds) \(I_1,\ldots ,I_K\) such that \(\bigcup _{j=1}^K I_j = \{1,\ldots ,n\}\). For each \(j \in \{1,\ldots ,K\}\), we withhold the indices in \(I_j\) and fit a linear model \(y_{-I_j}=X_{-I_j,\cdot } \beta \) using the FISTA algorithm. After obtaining an estimate \(\hat{\beta }\), we use the withheld rows of X indexed in \(I_j\) to predict the withheld entries of y, that is we compute \(X_{I_j,\cdot } \hat{\beta }\). We evaluate the accuracy of the prediction with the \(L_2\) norm, that is we compute \(\Vert X_{I_j,\cdot } \hat{\beta } - y_{I_j} \Vert _2\). Repeating this computation for all \(j \in \{1,\ldots ,K\}\) allows us to compute an average \(L_2\) error over the K folds (called the cross validation error). We plot this error as a function of \(\uplambda \).
The result is shown in Fig. 5. We observe that for the simulation scenario we consider in Sect. 3.1, the choice \(\uplambda =0.3\) is sensible.
Sensitivity analysis
In the linear regression model \(y=X\beta +\epsilon \) under consideration in this work (see Sect. 1), it is easy to see that the larger the noise/error \(\epsilon \), the harder it will be to obtain accurate estimates of \(\beta \).
To quantify this statement, Fig. 6 presents a sensitivity analysis for the recovery accuracy of the parameter estimate \(\beta \) (measured as the \(L_2\) norm between the fitted parameter estimate returned by the unsmoothed Lasso, the FISTA algorithm, and the smoothed Lasso, to the truth) as a function of the standard deviation \(\sigma \). The setup of the simulation is identical to the one of Sect. 3.1, though now \(n=100\) and \(p=200\) are fixed. The entries of the noise vector \(\epsilon \in \mathbb {R}^n\) in the model \(y = X\beta + \epsilon \) are generated independently from a Normal distribution with mean zero and a varying standard deviation \(\sigma \in [0,10]\).
Figure 6 (left) shows that, as expected, the accuracy of the recovered estimate of \(\beta \) decreases for all methods as \(\sigma \) increases. However, this increase seems rather slow. The runtime as a function of \(\sigma \), depicted in Fig. 6 (right), stays roughly constant for all methods, as expected.
Proof of Proposition 1
Proof
The bounds on \(L_e^\mu \) and \(L_s^\mu \) follow from Eqs. (8) and (10) after a direct calculation. In particular, for the entropy prox function,
where f is as defined in Sect. 2.2 and where it was used that \(\uplambda \ge 0\). The result for the squared error prox smoothed \(L_s^\mu \) is proven analogously.
Since both \(f_e^\mu \) and \(f_s^\mu \) are convex according to Nesterov (2005, Theorem 1), and since the least squares term \(\frac{1}{2} \Vert X\beta - y \Vert _2^2\) is convex, it follows that both \(L_e^\mu \) and \(L_s^\mu \) remain convex as the sum of two convex functions.
To be precise, strict convexity holds true. Observe that the second derivative of the entropy smoothed absolute value of Sect. 2.2.1 is given by
which is always positive, thus making \(f_e^\mu \) strictly convex. Therefore, \(L_e^\mu \) is strictly convex as the sum of a convex function and a strictly convex function. Similar arguments show that \(L_s^\mu \) is strictly convex. \(\square \)
Rights and permissions
About this article
Cite this article
Hahn, G., Lutz, S.M., Laha, N. et al. A fast and efficient smoothing approach to Lasso regression and an application in statistical genetics: polygenic risk scores for chronic obstructive pulmonary disease (COPD). Stat Comput 31, 35 (2021). https://doi.org/10.1007/s11222-021-10010-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-021-10010-0