Skip to main content

Advertisement

Log in

A fast and efficient smoothing approach to Lasso regression and an application in statistical genetics: polygenic risk scores for chronic obstructive pulmonary disease (COPD)

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

High dimensional linear regression problems are often fitted using Lasso approaches. Although the Lasso objective function is convex, it is not differentiable everywhere, making the use of gradient descent methods for minimization not straightforward. To avoid this technical issue, we apply Nesterov smoothing to the original (unsmoothed) Lasso objective function. We introduce a closed-form smoothed Lasso which preserves the convexity of the Lasso function, is uniformly close to the unsmoothed Lasso, and allows us to obtain closed-form derivatives everywhere for efficient and fast minimization via gradient descent. Our simulation studies are focused on polygenic risk scores using genetic data from a genome-wide association study (GWAS) for chronic obstructive pulmonary disease (COPD). We compare accuracy and runtime of our approach to the current gold standard in the literature, the FISTA algorithm. Our results suggest that the proposed methodology provides estimates with equal or higher accuracy than the FISTA algorithm while having the same asymptotic runtime scaling. The proposed methodology is implemented in the R-package smoothedLasso, available on the Comprehensive R Archive Network (CRAN).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci. 2(1), 183–202 (2009)

    Article  MathSciNet  Google Scholar 

  • Chi, E., Goldstein, T., Studer, C., Baraniuk, R.: fasta: fast adaptive shrinkage/thresholding algorithm. R-package version 1 (2018)

  • Daubechies, I., Defrise, M., Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. 57(11), 1413–1457 (2004)

    Article  MathSciNet  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)

    Article  MathSciNet  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001)

    Article  MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Article  Google Scholar 

  • Hahn, G., Banerjee, M., Sen, B.: Parameter estimation and inference in a continuous piecewise linear regression model (2017). http://www.cantab.net/users/ghahn/preprints/PhaseRegMultiDim.pdf. Accessed 21 Mar 2017

  • Hahn, G., Lutz, S.M., Laha, N., Lange, C.: smoothedLasso: smoothed LASSO regression via Nesterov smoothing. R-package version 1.3 (2020). https://cran.r-project.org/package=smoothedLasso. Accessed 21 Mar 2017

  • Hastie, T., Efron, B.: lars: least angle regression, lasso and forward stagewise. R-package version 1.2 (2013)

  • Khera, A.V., Chaffin, M., Aragam, K.G., Haas, M.E., Roselli, C., Choi, S.H., Natarajan, P., Lander, E.S., Lubitz, S.A., Ellinor, P.T., Kathiresan, S.: Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018)

  • Mak, T., Porsch, R., Choi, S., Zhou, X., Sham, P.: Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41(6), 469–480 (2016)

    Article  Google Scholar 

  • Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of \(\mathbb{R}^n\). J. Optim. Theory App. 50(1), 195–200 (1986)

    Article  MathSciNet  Google Scholar 

  • Nesterov, Y.: A method of solving a convex programming problem with convergence rate \(O(1/k^2)\). Dokl. Akad. Nauk SSSR 269(3), 543–547 (1983)

    MathSciNet  Google Scholar 

  • Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program. Ser. A 103, 127–152 (2005)

    Article  MathSciNet  Google Scholar 

  • NHLBI TOPMed: Boston Early-Onset COPD Study in the National Heart, Lung, and Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) Program (2018). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000946.v3.p1. Accessed 18 Oct 2016

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Stat Comp, Vienna, Austria (2014). http://www.R-project.org/. Accessed 2 Sept 2019

  • Regan, E., Hokanson, J., Murphy, J., Make, B., Lynch, D., Beaty, T., Curran-Everett, D., Silverman, E., Crapo, J.: Genetic epidemiology of copd (copdgene) study design 2. COPD 7, 32–43 (2010)

    Article  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R.: Model selection and validation 1: cross-validation (2013). https://www.stat.cmu.edu/~ryantibs/datamining/lectures/18-val1.pdf. Accessed 2 Sept 2019

  • Wu, T., Chen, Y., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by Lasso penalized logistic regression. Bioinformatics 25(6), 714–721 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georg Hahn.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Selection of the Lasso regularization parameter via cross validation

We aim to select the Lasso regularization parameter \(\uplambda \) using cross validation. To this end, for the simulation scenario described in Sect. 3.1 (in particular, for the chosen noise level of \(\sigma =0.5\) and the sparsity level of \(20\%\), as well as \(n=1000\) and \(p=100\)), we perform 10-fold cross validation as described in Tibshirani (2013).

To be precise, we first fix a grid of admissible values of \(\uplambda \) from which we would like to choose the regularization parameter (here, \(\uplambda \in \{0,0.05,0.1,0.15,\ldots ,,1\}\)). We then randomly divide the n data points into \(K=10\) disjoint sets (folds) \(I_1,\ldots ,I_K\) such that \(\bigcup _{j=1}^K I_j = \{1,\ldots ,n\}\). For each \(j \in \{1,\ldots ,K\}\), we withhold the indices in \(I_j\) and fit a linear model \(y_{-I_j}=X_{-I_j,\cdot } \beta \) using the FISTA algorithm. After obtaining an estimate \(\hat{\beta }\), we use the withheld rows of X indexed in \(I_j\) to predict the withheld entries of y, that is we compute \(X_{I_j,\cdot } \hat{\beta }\). We evaluate the accuracy of the prediction with the \(L_2\) norm, that is we compute \(\Vert X_{I_j,\cdot } \hat{\beta } - y_{I_j} \Vert _2\). Repeating this computation for all \(j \in \{1,\ldots ,K\}\) allows us to compute an average \(L_2\) error over the K folds (called the cross validation error). We plot this error as a function of \(\uplambda \).

The result is shown in Fig. 5. We observe that for the simulation scenario we consider in Sect. 3.1, the choice \(\uplambda =0.3\) is sensible.

Fig. 5
figure 5

Cross validation error as a function of the Lasso regularization parameter \(\uplambda \) for the FISTA algorithm. Simulated data with \(n=1000\) and \(p=100\)

Fig. 6
figure 6

Sensitivity analysis on simulated data: \(L_2\) norm of parameter estimate to truth (left) and runtime in seconds (right) as a function of the standard deviation \(\sigma \) while \(n=1000\) and \(p=2000\). Logarithmic scale on the y-axis of the right plot

Sensitivity analysis

In the linear regression model \(y=X\beta +\epsilon \) under consideration in this work (see Sect. 1), it is easy to see that the larger the noise/error \(\epsilon \), the harder it will be to obtain accurate estimates of \(\beta \).

To quantify this statement, Fig. 6 presents a sensitivity analysis for the recovery accuracy of the parameter estimate \(\beta \) (measured as the \(L_2\) norm between the fitted parameter estimate returned by the unsmoothed Lasso, the FISTA algorithm, and the smoothed Lasso, to the truth) as a function of the standard deviation \(\sigma \). The setup of the simulation is identical to the one of Sect. 3.1, though now \(n=100\) and \(p=200\) are fixed. The entries of the noise vector \(\epsilon \in \mathbb {R}^n\) in the model \(y = X\beta + \epsilon \) are generated independently from a Normal distribution with mean zero and a varying standard deviation \(\sigma \in [0,10]\).

Figure 6 (left) shows that, as expected, the accuracy of the recovered estimate of \(\beta \) decreases for all methods as \(\sigma \) increases. However, this increase seems rather slow. The runtime as a function of \(\sigma \), depicted in Fig. 6 (right), stays roughly constant for all methods, as expected.

Proof of Proposition 1

Proof

The bounds on \(L_e^\mu \) and \(L_s^\mu \) follow from Eqs. (8) and (10) after a direct calculation. In particular, for the entropy prox function,

$$\begin{aligned} \sup _{\beta \in \mathbb {R}^p} \left| L_e^\mu (\beta ) - L(\beta ) \right|&= \sup _{\beta \in \mathbb {R}^p} \left| \uplambda \sum _{i=1}^p f_s^\mu (\beta _i) - \uplambda \sum _{i=1}^p |\beta _i| \right| \\&\le \sup _{\beta \in \mathbb {R}^p} \uplambda \sum _{i=1}^p \Big | f_s^\mu (\beta _i) - f(\beta _i) \Big |\\&\le \sup _{\beta \in \mathbb {R}^p} \uplambda \sum _{i=1}^p \mu \log (2)\\&\le \uplambda p \mu \log (2), \end{aligned}$$

where f is as defined in Sect. 2.2 and where it was used that \(\uplambda \ge 0\). The result for the squared error prox smoothed \(L_s^\mu \) is proven analogously.

Since both \(f_e^\mu \) and \(f_s^\mu \) are convex according to Nesterov (2005, Theorem 1), and since the least squares term \(\frac{1}{2} \Vert X\beta - y \Vert _2^2\) is convex, it follows that both \(L_e^\mu \) and \(L_s^\mu \) remain convex as the sum of two convex functions.

To be precise, strict convexity holds true. Observe that the second derivative of the entropy smoothed absolute value of Sect. 2.2.1 is given by

$$\begin{aligned} \frac{\partial ^2}{\partial z^2} f_e^\mu (z) = \frac{4 e^{2x/\mu }}{\mu \left( e^{2x/\mu } +1 \right) ^2}, \end{aligned}$$

which is always positive, thus making \(f_e^\mu \) strictly convex. Therefore, \(L_e^\mu \) is strictly convex as the sum of a convex function and a strictly convex function. Similar arguments show that \(L_s^\mu \) is strictly convex. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hahn, G., Lutz, S.M., Laha, N. et al. A fast and efficient smoothing approach to Lasso regression and an application in statistical genetics: polygenic risk scores for chronic obstructive pulmonary disease (COPD). Stat Comput 31, 35 (2021). https://doi.org/10.1007/s11222-021-10010-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-021-10010-0

Keywords

Navigation