Skip to main content

Advertisement

Log in

Penalized full likelihood approach to variable selection for Cox’s regression model under nested case–control sampling

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

Assuming Cox’s regression model, we consider penalized full likelihood approach to conduct variable selection under nested case–control (NCC) sampling. Penalized non-parametric maximum likelihood estimates (PNPMLEs) are characterized by self-consistency equations derived from score functions. A cross-validation method based on profile likelihood is used to choose the tuning parameter within a family of penalty functions. Simulation studies indicate that the numerical performance of (P)NPMLE is better than weighted partial likelihood in estimating the log-relative risk and in identifying the covariates and the model, under NCC sampling. LASSO performs best when cohort size is small; SCAD performs best when cohort size is large and may eventually perform as well as the oracle estimator. Using the SCAD penalty, we establish the consistency, asymptotic normality, and oracle properties of the PNPMLE, as well as the sparsity property of the penalty. We also propose a consistent estimate of the asymptotic variance using observed profile likelihood. Our method is illustrated to analyze the diagnosis of liver cancer among those in a type 2 diabetic mellitus dataset who were treated with thiazolidinediones in Taiwan.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Borgan Ø, Zhang Y (2015) Using cumulative sums of martingale residuals for model checking in nested case–control studies. Biometrics 71(3):696–703

    Article  MathSciNet  MATH  Google Scholar 

  • Chang IS, Hsiung CA, Wang MC, Wen CC (2005) An asymptotic theory for the nonparametric maximum likelihood estimation in the Cox-gene model. Bernoulli 11(5):863–892

    Article  MathSciNet  MATH  Google Scholar 

  • Chang CH, Lin JW, Wu LC, Lai MS, Chuang LM, Chan KA (2012) Association of thiazolidinediones with liver cancer and colorectal cancer in type 2 diabetes mellitus. Hepatology 55(5):1462–1472

    Article  Google Scholar 

  • Chen KN (2001) Generalized case-cohort sampling. J R Stat Soc B 63(4):791–809

    Article  MathSciNet  MATH  Google Scholar 

  • Chen HY (2002) Double-semiparametric method for missing covariates in Cox regression models. J Am Stat Assoc 97(458):565–576

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1):74–99

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70(5):849–911

    Article  MathSciNet  MATH  Google Scholar 

  • Gau CS, Chang IS, Lin Wu FL, Yu HT, Huang YW, Chi CL, Chien SY, Lin KM, Liu MY, Wang HP (2007) Usage of the claim database of national health insurance programme for analysis of cisapride–erythromycin co-medication in Taiwan. Pharmacoepidemiol Drug Saf 16(1):86–95

    Article  Google Scholar 

  • Giovannucci E, Harlan DM, Archer MC, Bergenstal RM, Gapstur SM, Habel LA, Pollak M, Regensteiner JG, Yee D (2010) Diabetes and cancer: a consensus report. CA Cancer J Clin 60(4):207–221

    Article  Google Scholar 

  • Hunter DR, Li R (2005) Variable selection using MM algorithms. Ann Stat 33(4):1–8

    Article  MathSciNet  MATH  Google Scholar 

  • Kim RS (2013) Lesser known facts about nested case–control designs. J Transl Med Epidemiol 1(1):1007

    Google Scholar 

  • Liu ML, Lu WB, Shore RE, Zeleniuch-Jacquotte A (2010) Cox regression model with time-varying coefficients in nested case-control studies. Biostatistics 11(4):693–706

    Article  Google Scholar 

  • Ni A, Cai JW, Zeng DL (2016) Variable selection for case-cohort studies with failure time outcome. Biometrika 103(3):547–562

    Article  MathSciNet  MATH  Google Scholar 

  • Nicolucci A (2010) Epidemiological aspects of neoplasms in diabetes. Acta Diabetol 47(2):87–95

    Article  Google Scholar 

  • Prentice RL (1986) A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73(1):1–11

    Article  MathSciNet  MATH  Google Scholar 

  • Saarela O, Kulathinal S, Arjas E, Läärä E (2008) Nested case-control data utilized for multiple outcomes: a likelihood approach and alternatives. Stat Med 27(28):5991–6008

    Article  MathSciNet  Google Scholar 

  • Samuelsen SO (1997) A pseudolikelihood approach to analysis of nested case–control studies. Biometrika 84(2):379–394

    Article  MathSciNet  MATH  Google Scholar 

  • Scheike TH, Juul A (2004) Maximum likelihood estimation for Cox’s regression model under nested case–control sampling. Biostatistics 5(2):193–206

    Article  MATH  Google Scholar 

  • Scheike TH, Martinussen T (2004) Maximum likelihood estimation for Cox’s regression model under case-cohorts sampling. Scand J Stat 31(2):283–293

    Article  MathSciNet  MATH  Google Scholar 

  • Støer NC, Samuelsen SO (2012) Comparison of estimators in nested case–control studies with multiple outcomes. Lifetime Data Anal 18(3):261–283

    Article  MathSciNet  MATH  Google Scholar 

  • Thomas DC (1977) Addendum to “methods of cohort analysis: appraisal by application to asbestos mining,” by Liddell FDK, McDonald JC, Thomas DC. J R Stat Soc A 140(4):483–485

    Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385–395

    Article  Google Scholar 

  • Verweij PJM, Van Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):2305–2314

    Article  Google Scholar 

  • Vigneri P, Frasca L, Sciacca L, Pandini G, Vigneri R (2009) Diabetes and cancer. Endocr Relat Cancer 16:1103–1123

    Article  Google Scholar 

  • Zhao SD, Li Y (2012) Principled sure independence screening for Cox models with ultra-high-dimensional covariates. J Multivar Anal 105(1):397–411

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao SD, Li Y (2014) Score test variable screening. Biometrics 70(4):862–871

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Funding was provided by Ministry of Science and Technology, Taiwan (Grant No. MOST-105-2319-B400-002). We are very grateful to the AE and referees, whose valuable comments and suggestions led to significant improvement of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I-Shou Chang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 111 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, JH., Pan, CH., Chang, IS. et al. Penalized full likelihood approach to variable selection for Cox’s regression model under nested case–control sampling. Lifetime Data Anal 26, 292–314 (2020). https://doi.org/10.1007/s10985-019-09475-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-019-09475-z

Keywords

Navigation