Variable selection strategies in survival models with multiple imputations

Vonta, Filia; Karagrigoriou, Alex

doi:10.1007/s10985-007-9050-4

Variable selection strategies in survival models with multiple imputations

Published: 31 August 2007

Volume 13, pages 295–315, (2007)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Filia Vonta¹ &
Alex Karagrigoriou¹

161 Accesses
4 Citations
Explore all metrics

Abstract

In this paper, the variable selection strategies (criteria) are thoroughly discussed and their use in various survival models is investigated. The asymptotic efficiency property, in the sense of Shibata Ann Stat 8: 147–164, 1980, of a class of variable selection strategies which includes the AIC and all criteria equivalent to it, is established for a general class of survival models, such as parametric frailty or transformation models and accelerated failure time models, under minimum conditions. Furthermore, a multiple imputations method is proposed which is found to successfully handle censored observations and constitutes a competitor to existing methods in the literature. A number of real and simulated data are used for illustrative purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

References

Akaike H (1969). Fitting autoregressive models for prediction. Ann Inst Statist Math 21: 243–247
Article MATH Google Scholar
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Proceedings of the 2nd international symposium on information theory. Akademia Kiado, Budapest, pp 267–281
Bagdonavicius V and Nikulin M (2002). Accelerated life models. Chapman and Hall/CRC, Boca Raton
MATH Google Scholar
Bennett S (1983). Analysis of survival data by the proportional odds model. Stat Med 2: 273–277
Article Google Scholar
Bhansali RJ (1986). Asymptotically efficient selection of the order by the criterion autoregressive transfer function. Ann Stat 14: 315–325
MATH Google Scholar
Cheng SC, Wei LJ and Ying Z (1995). Analysis of transformation models with censored data. Biometrika 82: 835–845
Article MATH Google Scholar
Claeskens G and Hjort NL (2003). The focused information criterion (with discussion). J Am Stat Assoc 98: 900–916
Article MATH Google Scholar
Claeskens G, Croux C, Kerckhoven J van (2006). Variable selection for logistic regression using a prediction-focused information criterion. Biometrics 62: 972–979
Article MATH Google Scholar
Clayton DG, Cuzick J (1986) The semi-parametric Pareto model for regression analysis of survival times. In: Papers on semiparametric models MS-R8614. Centrum voor Wiskunde en Informatica, Amsterdam, pp 19–31
Cox DR (1972). Regression models with life tables (with discussion). J Roy Stat Soc B 34: 187–220
MATH Google Scholar
Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456): 1348–1360
Article MATH Google Scholar
Fan J and Li R (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30(1): 74–99
Article MATH Google Scholar
Fan J and Peng H (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3): 928–961
Article MATH Google Scholar
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley
Hannan EJ and Quinn BG (1979). The determination of the order of an autoregression. J Roy Stat Soc B 41: 190–195
MATH Google Scholar
Hocking RR (1976). The analysis and selection of variables in linear regression. Biometrics 32: 1–49
Article MATH Google Scholar
Hougaard P (1986). Survival models for heterogeneous populations derived from stable distributions. Biometrika 73: 387–396
Article MATH Google Scholar
Hsu CH, Taylor JM, Murray S and Commenges D (2006). Survival analysis using auxiliary variables via non-parametric multiple imputation. Stat Med 25(20): 3503–3517
Article Google Scholar
Hsu CH, Taylor JM, Murray S and Commenges D (2007). Multiple imputation for interval censored data with auxiliary variables. Stat Med 26(4): 769–781
Article Google Scholar
Hurvich CM and Tsai CL (1989). Regression and time series model selection in small samples. Biometrika 76: 297–307
Article MATH Google Scholar
Ishiguro M, Sakamoto Y and Kitagawa G (1997). Bootstrapping log likelihood and EIC, an extension of AIC. Ann Inst Stat Math 49: 411–434
Article MATH Google Scholar
Kalbfleisch JD and Prentice RL (1980). The statistical analysis of failure time data. Wiley, New York
MATH Google Scholar
Karagrigoriou A (1997). Asymptotic efficiency of the order selection of a nongaussian AR process. Stat Sinica 7: 407–423
MATH Google Scholar
Lee S and Karagrigoriou A (2001). An asymptotically optimal selection of the order of a linear process. Sankhya A 63(1): 93–106
MATH Google Scholar
Liquet B, Sakarovitch C and Commenges D (2003). Bootstrap choice of estimators in parametric and semiparametric families: an extension of EIC. Biometrics 59: 172–178
Article Google Scholar
Little RJA and Rubin DB (1987). Statistical analysis with missing data. John Wiley and Sons, New York
MATH Google Scholar
Mallows CL (1973). Some comments on C _p. Technometrics 15: 661–676
Article MATH Google Scholar
Pan W (2001). A multiple imputation approach to regression analysis for doubly censored data with application to AIDS studies. Biometrics 57: 1245–1250
Article Google Scholar
Rissanen J (1986). Stochastic complexity and modeling. Ann Stat 14: 1080–1100
MATH Google Scholar
Rubin DB and Schenker N (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81(394): 366–374
Article MATH Google Scholar
Schwarz G (1978). Estimating the dimension of a model. Ann Stat 6: 461–464
MATH Google Scholar
Shibata R (1980). Asymptotically efficient selection of the order of the model for estimating parameters of linear process. Ann Stat 8: 147–164
MATH Google Scholar
Shibata R (1981). An optimal selection of regression variables. Biometrika 68: 45–54
Article MATH Google Scholar
Tableman M, Kim JS (2004) Survival analysis using S. Analysis of time-to-event data. Chapman and Hall/CRC
Tsiatis AA, Davidian M and McNeney B (2002). Multiple imputation methods for testing treatment differences in survival distributions with missing cause of failure. Biometrika 89: 238–244
Article MATH Google Scholar
van Buuren S, Boshuizen HC, Knook DL (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18(6): 681–694
Article Google Scholar
Vaupel JW, Manton KG and Stallard E (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16: 439–454
Article Google Scholar
Vonta F (1996). Efficient estimation in a non-proportional hazards model in survival analysis. Scand J Stat 23: 49–61
MATH Google Scholar
Wei CZ (1992). On predictive least squares principle. Ann Stat 20: 1–42
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Cyprus, New University Campus, Nicosia, 1618, Cyprus
Filia Vonta & Alex Karagrigoriou

Authors

Filia Vonta
View author publications
You can also search for this author in PubMed Google Scholar
Alex Karagrigoriou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Filia Vonta.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vonta, F., Karagrigoriou, A. Variable selection strategies in survival models with multiple imputations. Lifetime Data Anal 13, 295–315 (2007). https://doi.org/10.1007/s10985-007-9050-4

Download citation

Received: 19 October 2006
Accepted: 25 July 2007
Published: 31 August 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s10985-007-9050-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection strategies in survival models with multiple imputations

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable selection strategies in survival models with multiple imputations

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation