Abstract
In this paper, the variable selection strategies (criteria) are thoroughly discussed and their use in various survival models is investigated. The asymptotic efficiency property, in the sense of Shibata Ann Stat 8: 147–164, 1980, of a class of variable selection strategies which includes the AIC and all criteria equivalent to it, is established for a general class of survival models, such as parametric frailty or transformation models and accelerated failure time models, under minimum conditions. Furthermore, a multiple imputations method is proposed which is found to successfully handle censored observations and constitutes a competitor to existing methods in the literature. A number of real and simulated data are used for illustrative purposes.
Similar content being viewed by others
References
Akaike H (1969). Fitting autoregressive models for prediction. Ann Inst Statist Math 21: 243–247
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of the 2nd international symposium on information theory. Akademia Kiado, Budapest, pp 267–281
Bagdonavicius V and Nikulin M (2002). Accelerated life models. Chapman and Hall/CRC, Boca Raton
Bennett S (1983). Analysis of survival data by the proportional odds model. Stat Med 2: 273–277
Bhansali RJ (1986). Asymptotically efficient selection of the order by the criterion autoregressive transfer function. Ann Stat 14: 315–325
Cheng SC, Wei LJ and Ying Z (1995). Analysis of transformation models with censored data. Biometrika 82: 835–845
Claeskens G and Hjort NL (2003). The focused information criterion (with discussion). J Am Stat Assoc 98: 900–916
Claeskens G, Croux C, Kerckhoven J van (2006). Variable selection for logistic regression using a prediction-focused information criterion. Biometrics 62: 972–979
Clayton DG, Cuzick J (1986) The semi-parametric Pareto model for regression analysis of survival times. In: Papers on semiparametric models MS-R8614. Centrum voor Wiskunde en Informatica, Amsterdam, pp 19–31
Cox DR (1972). Regression models with life tables (with discussion). J Roy Stat Soc B 34: 187–220
Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456): 1348–1360
Fan J and Li R (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1): 74–99
Fan J and Peng H (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3): 928–961
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley
Hannan EJ and Quinn BG (1979). The determination of the order of an autoregression. J Roy Stat Soc B 41: 190–195
Hocking RR (1976). The analysis and selection of variables in linear regression. Biometrics 32: 1–49
Hougaard P (1986). Survival models for heterogeneous populations derived from stable distributions. Biometrika 73: 387–396
Hsu CH, Taylor JM, Murray S and Commenges D (2006). Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med 25(20): 3503–3517
Hsu CH, Taylor JM, Murray S and Commenges D (2007). Multiple imputation for interval censored data with auxiliary variables. Stat Med 26(4): 769–781
Hurvich CM and Tsai CL (1989). Regression and time series model selection in small samples. Biometrika 76: 297–307
Ishiguro M, Sakamoto Y and Kitagawa G (1997). Bootstrapping log likelihood and EIC, an extension of AIC. Ann Inst Stat Math 49: 411–434
Kalbfleisch JD and Prentice RL (1980). The statistical analysis of failure time data. Wiley, New York
Karagrigoriou A (1997). Asymptotic efficiency of the order selection of a nongaussian AR process. Stat Sinica 7: 407–423
Lee S and Karagrigoriou A (2001). An asymptotically optimal selection of the order of a linear process. Sankhya A 63(1): 93–106
Liquet B, Sakarovitch C and Commenges D (2003). Bootstrap choice of estimators in parametric and semiparametric families: an extension of EIC. Biometrics 59: 172–178
Little RJA and Rubin DB (1987). Statistical analysis with missing data. John Wiley and Sons, New York
Mallows CL (1973). Some comments on C p . Technometrics 15: 661–676
Pan W (2001). A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 57: 1245–1250
Rissanen J (1986). Stochastic complexity and modeling. Ann Stat 14: 1080–1100
Rubin DB and Schenker N (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81(394): 366–374
Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6: 461–464
Shibata R (1980). Asymptotically efficient selection of the order of the model for estimating parameters of linear process. Ann Stat 8: 147–164
Shibata R (1981). An optimal selection of regression variables. Biometrika 68: 45–54
Tableman M, Kim JS (2004) Survival analysis using S. Analysis of time-to-event data. Chapman and Hall/CRC
Tsiatis AA, Davidian M and McNeney B (2002). Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika 89: 238–244
van Buuren S, Boshuizen HC, Knook DL (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18(6): 681–694
Vaupel JW, Manton KG and Stallard E (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16: 439–454
Vonta F (1996). Efficient estimation in a non-proportional hazards model in survival analysis. Scand J Stat 23: 49–61
Wei CZ (1992). On predictive least squares principle. Ann Stat 20: 1–42
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vonta, F., Karagrigoriou, A. Variable selection strategies in survival models with multiple imputations. Lifetime Data Anal 13, 295–315 (2007). https://doi.org/10.1007/s10985-007-9050-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-007-9050-4