Skip to main content
Log in

Variable selection strategies in survival models with multiple imputations

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

In this paper, the variable selection strategies (criteria) are thoroughly discussed and their use in various survival models is investigated. The asymptotic efficiency property, in the sense of Shibata Ann Stat 8: 147–164, 1980, of a class of variable selection strategies which includes the AIC and all criteria equivalent to it, is established for a general class of survival models, such as parametric frailty or transformation models and accelerated failure time models, under minimum conditions. Furthermore, a multiple imputations method is proposed which is found to successfully handle censored observations and constitutes a competitor to existing methods in the literature. A number of real and simulated data are used for illustrative purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1969). Fitting autoregressive models for prediction. Ann Inst Statist Math 21: 243–247

    Article  MATH  Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of the 2nd international symposium on information theory. Akademia Kiado, Budapest, pp 267–281

  • Bagdonavicius V and Nikulin M (2002). Accelerated life models. Chapman and Hall/CRC, Boca Raton

    MATH  Google Scholar 

  • Bennett S (1983). Analysis of survival data by the proportional odds model. Stat Med 2: 273–277

    Article  Google Scholar 

  • Bhansali RJ (1986). Asymptotically efficient selection of the order by the criterion autoregressive transfer function. Ann Stat 14: 315–325

    MATH  Google Scholar 

  • Cheng SC, Wei LJ and Ying Z (1995). Analysis of transformation models with censored data. Biometrika 82: 835–845

    Article  MATH  Google Scholar 

  • Claeskens G and Hjort NL (2003). The focused information criterion (with discussion). J Am Stat Assoc 98: 900–916

    Article  MATH  Google Scholar 

  • Claeskens G, Croux C, Kerckhoven J van (2006). Variable selection for logistic regression using a prediction-focused information criterion. Biometrics 62: 972–979

    Article  MATH  Google Scholar 

  • Clayton DG, Cuzick J (1986) The semi-parametric Pareto model for regression analysis of survival times. In: Papers on semiparametric models MS-R8614. Centrum voor Wiskunde en Informatica, Amsterdam, pp 19–31

  • Cox DR (1972). Regression models with life tables (with discussion). J Roy Stat Soc B 34: 187–220

    MATH  Google Scholar 

  • Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456): 1348–1360

    Article  MATH  Google Scholar 

  • Fan J and Li R (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1): 74–99

    Article  MATH  Google Scholar 

  • Fan J and Peng H (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3): 928–961

    Article  MATH  Google Scholar 

  • Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley

  • Hannan EJ and Quinn BG (1979). The determination of the order of an autoregression. J Roy Stat Soc B 41: 190–195

    MATH  Google Scholar 

  • Hocking RR (1976). The analysis and selection of variables in linear regression. Biometrics 32: 1–49

    Article  MATH  Google Scholar 

  • Hougaard P (1986). Survival models for heterogeneous populations derived from stable distributions. Biometrika 73: 387–396

    Article  MATH  Google Scholar 

  • Hsu CH, Taylor JM, Murray S and Commenges D (2006). Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med 25(20): 3503–3517

    Article  Google Scholar 

  • Hsu CH, Taylor JM, Murray S and Commenges D (2007). Multiple imputation for interval censored data with auxiliary variables. Stat Med 26(4): 769–781

    Article  Google Scholar 

  • Hurvich CM and Tsai CL (1989). Regression and time series model selection in small samples. Biometrika 76: 297–307

    Article  MATH  Google Scholar 

  • Ishiguro M, Sakamoto Y and Kitagawa G (1997). Bootstrapping log likelihood and EIC, an extension of AIC. Ann Inst Stat Math 49: 411–434

    Article  MATH  Google Scholar 

  • Kalbfleisch JD and Prentice RL (1980). The statistical analysis of failure time data. Wiley, New York

    MATH  Google Scholar 

  • Karagrigoriou A (1997). Asymptotic efficiency of the order selection of a nongaussian AR process. Stat Sinica 7: 407–423

    MATH  Google Scholar 

  • Lee S and Karagrigoriou A (2001). An asymptotically optimal selection of the order of a linear process. Sankhya A 63(1): 93–106

    MATH  Google Scholar 

  • Liquet B, Sakarovitch C and Commenges D (2003). Bootstrap choice of estimators in parametric and semiparametric families: an extension of EIC. Biometrics 59: 172–178

    Article  Google Scholar 

  • Little RJA and Rubin DB (1987). Statistical analysis with missing data. John Wiley and Sons, New York

    MATH  Google Scholar 

  • Mallows CL (1973). Some comments on C p . Technometrics 15: 661–676

    Article  MATH  Google Scholar 

  • Pan W (2001). A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 57: 1245–1250

    Article  Google Scholar 

  • Rissanen J (1986). Stochastic complexity and modeling. Ann Stat 14: 1080–1100

    MATH  Google Scholar 

  • Rubin DB and Schenker N (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81(394): 366–374

    Article  MATH  Google Scholar 

  • Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6: 461–464

    MATH  Google Scholar 

  • Shibata R (1980). Asymptotically efficient selection of the order of the model for estimating parameters of linear process. Ann Stat 8: 147–164

    MATH  Google Scholar 

  • Shibata R (1981). An optimal selection of regression variables. Biometrika 68: 45–54

    Article  MATH  Google Scholar 

  • Tableman M, Kim JS (2004) Survival analysis using S. Analysis of time-to-event data. Chapman and Hall/CRC

  • Tsiatis AA, Davidian M and McNeney B (2002). Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika 89: 238–244

    Article  MATH  Google Scholar 

  • van Buuren S, Boshuizen HC, Knook DL (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18(6): 681–694

    Article  Google Scholar 

  • Vaupel JW, Manton KG and Stallard E (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16: 439–454

    Article  Google Scholar 

  • Vonta F (1996). Efficient estimation in a non-proportional hazards model in survival analysis. Scand J Stat 23: 49–61

    MATH  Google Scholar 

  • Wei CZ (1992). On predictive least squares principle. Ann Stat 20: 1–42

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Filia Vonta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vonta, F., Karagrigoriou, A. Variable selection strategies in survival models with multiple imputations. Lifetime Data Anal 13, 295–315 (2007). https://doi.org/10.1007/s10985-007-9050-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-007-9050-4

Keywords

Navigation