Abstract
Lifetime risk measures the cumulative risk for developing a disease over one’s lifespan. Modeling the lifetime risk must account for left truncation, the competing risk of death, and inference at a fixed age. In addition, statistical methods to predict the lifetime risk should account for covariate-outcome associations that change with age. In this paper, we review and compare statistical methods to predict the lifetime risk. We first consider a generalized linear model for the lifetime risk using pseudo-observations of the Aalen-Johansen estimator at a fixed age, allowing for left truncation. We also consider modeling the subdistribution hazard with Fine-Gray and Royston-Parmar flexible parametric models in left truncated data with time-covariate interactions, and using these models to predict lifetime risk. In simulation studies, we found the pseudo-observation approach had the least bias, particularly in settings with crossing or converging cumulative incidence curves. We illustrate our method by modeling the lifetime risk of atrial fibrillation in the Framingham Heart Study. We provide technical guidance to replicate all analyses in R.
Similar content being viewed by others
Availability of data and material
To protect the confidentiality of the Framingham Heart Study participants, the data from our illustrative examples are not on our GitHub page. Participant level data from the Framingham Heart Study are available at the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) and BioLINCC (https://biolincc.nhlbi.nih.gov/home/).
References
Karmali KN, Lloyd-Jones DM. Adding a life-course perspective to cardiovascular-risk communication. Nat Rev Cardiol. 2013;10(2):111.
Seshadri S, Wolf PA. Lifetime risk of stroke and dementia: current concepts, and estimates from the Framingham study. Lancet Neurol. 2007;6(12):1106–14.
Beiser A, D’Agostino RB Sr, Seshadri S, et al. Computing estimates of incidence, including lifetime risk: Alzheimer’s disease in the framingham study. the practical incidence estimators (pie) macro. Stat Med. 2000;19(11–12):1495–522.
Gaynor JJ, Feuer EJ, Tan CC, et al. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. J Am Stat Assoc. 1993;88(422):400–9.
Brookmeyer R, Abdalla N. Multistate models and lifetime risk estimation: application to alzheimer’s disease. Stat Med. 2019;38(9):1558–65.
Dinse GE, Larson MG. A note on semi-markov models for partially censored data. Biometrika. 1986;73(2):379–86.
Carone M, Asgharian M, Jewell NP. Estimating the lifetime risk of dementia in the canadian elderly population using cross-sectional cohort survival data. J Am Stat Assoc. 2014;109(505):24–35.
Alonso A, Krijthe BP, Aspelund T, Stepas KA, Pencina MJ, Moser CB, Sinner MF, Sotoodehnia N, Fontes JD, Janssens AC, Kronmal RA. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J Am Heart Assoc. 2013;2(2):e000102.
Staerk L, Wang B, Preis SR, et al. Lifetime risk of atrial fibrillation according to optimal, borderline, or elevated levels of risk factors: cohort study based on longitudinal data from the framingham heart study. BMJ. 2018;361:k1453.
Grand MK, Putter H, Allignol A, et al. A note on pseudo-observations and left-truncation. Biom J. 2019;61(2):290–8.
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509.
Geskus RB. Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring. Biometrics. 2011;67(1):39–49.
Lambert PC, Wilkes SR, Crowther MJ. Flexible parametric modelling of the cause-specific cumulative incidence function. Stat Med. 2017;36(9):1429–46.
Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175–97.
Jeong JH, Fine J. A note on cause-specific residual life. Biometrika. 2009;96(1):237–42.
Du Y. Measuring Effects of Risk Factors on Cumulative Incidence and Remaining Lifetime Risk in the Presence of Competing Risks. PhD Thesis, Boston University, 2010.
Andersen P, Borgan O, Gill R et al. Statistical models based on counting processes springer-verlag: New york. MR1198884 1993; .
Allignol A, Schumacher M, Beyersmann J. A note on variance estimation of the aalen-johansen estimator of the cumulative incidence function in competing risks, with a view towards left-truncated data. Biom J. 2010;52(1):126–37.
Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous markov chains based on censored observations. Scand J Stat 1978; pp 141–150.
Klein JP, Andersen PK. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61(1):223–9.
Graw F, Gerds TA, Schumacher M. On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009;15(2):241–55.
Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90(1):15–27.
Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71–99.
Klein JP, Logan B, Harhoff M, et al. Analyzing survival curves at a fixed point in time. Stat Med. 2007;26(24):4505–19.
Chen J, Hou Y, Chen Z. Statistical inference methods for cumulative incidence function curves at a fixed point in time. Commun Stat Simul Comput 2018; pp 1–16.
Overgaard M, Parner ET, Pedersen J, et al. Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Ann Stat. 2017;45(5):1988–2015.
de Wreede LC, Fiocco M, Putter H, et al. mstate: an r package for the analysis of competing risks and multi-state models. J Stat Softw. 2011;38(7):1–30.
Beyersmann J, Allignol A, Schumacher M. Competing risks and multistate models with R. Berlin: Springer Science & Business Media; 2011.
Latouche A, Boisson V, Chevret S, et al. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(5):965–74.
Beyersmann J, Schumacher M. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(7):1649.
Muñoz A, Abraham AG, Matheson M et al. Non-proportionality of hazards in the competing risks framework. In Risk Assessment and Evaluation of Predictions. Springer, 2013; pp 3–22.
Thomas L, Reyes EM. Tutorial: survival estimation for cox regression models with time-varying coefficients using sas and r. J Stat Softw. 2014;61(c1):1–23.
Beyersmann J, Latouche A, Buchholz A, et al. Simulating competing risks data in survival analysis. Stat Med. 2009;28(6):956–71.
Zhou B, Fine J, Laird G. Goodness-of-fit test for proportional subdistribution hazards model. Stat Med. 2013;32(22):3804–11.
Li J, Scheike TH, Zhang MJ. Checking fine and gray subdistribution hazards model with cumulative sums of residuals. Lifetime Data Anal. 2015;21(2):197–217.
Halekoh U, Højsgaard S, Yan J, et al. The r package geepack for generalized estimating equations. J Stat Softw. 2006;15(2):1–11.
Therneau TM, Package Lumley T. Survival. R Top Doc. 2015;128(10):28–33.
Liu XR, Pawitan Y, Clements MS. Generalized survival models for correlated time-to-event data. Stat Med. 2017;36(29):4743–62.
Pan W. Akaike‘s information criterion in generalized estimating equations. Biometrics. 2001;57(1):120–5.
Rücker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol. 2014;14(1):129.
Kannel WB, Dawber TR, Kagan A, et al. Factors of risk in the development of coronary heart disease–six-year follow-up experience: the framingham study. Ann Intern Med. 1961;55(1):33–50.
Feinleib M, Kannel WB, Garrison RJ, et al. The framingham offspring study. design and preliminary data. Prev Med. 1975;4(4):518–25.
Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–73.
Austin PC, Latouche A, Fine JP. A review of the use of time-varying covariates in the fine-gray subdistribution hazard competing risk regression model. Stat Med. 2020;39(2):103–13.
Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133(6):601–9.
Hinchliffe SR, Lambert PC. Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions. BMC Med Res Methodol. 2013;13(1):13.
Mozumder SI, Rutherford M, Lambert P. Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach. Stat Med. 2018;37(1):82–97.
Jacobsen M, Martinussen T. A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scand J Stat. 2016;43(3):845–62.
Overgaard M, Parner ET, Pedersen J. Estimating the variance in a pseudo-observation scheme with competing risks. Scand J Stat. 2018;45(4):923–40.
Stegherr R, Allignol A, Meister R, et al. Estimating cumulative incidence functions in competing risks data with dependent left-truncation. Stat Med. 2020;39(4):481–93.
Pencina MJ, Larson MG, D’Agostino RB. Choice of time scale and its effect on significance of predictors in longitudinal studies. Stat Med. 2007;26(6):1343–59.
Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions. J Clin Epidemiol. 2013;66(6):648–53.
Binder N, Gerds TA, Andersen PK. Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal. 2014;20(2):303–15.
Bower H, Crowther MJ, Rutherford MJ, Andersson TM, Clements M, Liu XR, Dickman PW. Lambert PC. Capturing simple and complex time-dependent effects using flexible parametric survival models: a simulation study. Commun Stat Simul Comput; 2019. pp. 1–7.
Rutherford MJ, Crowther MJ, Lambert PC. The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study. J Stat Comput Simul. 2015;85(4):777–93.
Geskus RB. On the inclusion of prevalent cases in HIV/AIDS natural history studies through a marker-based estimate of time since seroconversion. Stat Med. 2000;19(13):1753–69.
Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res. 2004;33(2):261–304.
Muller CJ, MacLehose RF. Estimating predicted probabilities from logistic regression: different methods correspond to different target populations. Int J Epidemiol. 2014;43(3):962–70.
Acknowledgements
The authors thank Dr. Sarwar Mozumder and Dr. Paul Lambert for their support in implementing the flexible parametric model approach, and Katia Bulekova for her support with Boston University’s Shared Computing Cluster. The authors also thank the anonymous reviewers and Associate Editor for their thoughtful and helpful comments.
Funding
SCC received funding from the National Institute of General Medical Sciences (NIGMS): T32 GM74905-14 and the National Heart, Lung, and Blood Institute (NHLBI): F31 HL145904-01. EJB received funding from NHLBI: R01HL128914; 2R01 HL092577; 2U54HL120163; American Heart Association (AHA): 18SFRN34110082. LT received funding from AHA: 18SFRN34150007. The Framingham Heart Study is supported by NHLBI (N01-HC25195, HHSN268201500001I; 75N92019D00031) and Boston University School of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflicts of interest
The authors report no conflicts of interest.
Code availability
We provide R code to apply our methods with working examples and replicate our simulation studies on GitHub at https://github.com/s-conner/lifetimerisk.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Variance of non-parametric estimator of lifetime risk
The variance of the lifetime risk is estimated using Greenwood’s formula, [4, 18]
Weighted Breslow estimator of the baseline cumulative subdistribution hazard
The weighted Breslow estimator of the baseline cumulative subdistribution hazard is given by
where \({\hat{S}}^0(\hat{\varvec{\pi }}, u) = \sum _{i=1}^n w_i(u) T_i(u) \exp \{ \hat{\varvec{\pi }}^T Z_i(u) \}\) [11].
Illustration of non-proportional subdistribution hazards with proportional cause-specific hazards
Let the cause-specific hazards take Weibull form, \(\alpha _1(t; Z)=\frac{a_1}{b_1^{a_1}}t^{a_1 - 1} e^{\gamma _1 Z}\) and \(\alpha _2(t; Z)=\frac{a_2}{b_2^{a_2}}t^{a_2 - 1} e^{\gamma _2 Z}\), where Z is a binary covariate with a proportional effect on both cause-specific hazards. The all-cause cumulative hazard is \(A(t; Z)=\frac{t}{b_1}^{a_1} e^{\gamma _1 Z} + \frac{t}{b_2}^{a_2} e^{\gamma _1 Z}\). The proportional effect of Z on the cause-specific hazard can be verified by deriving the cause-specific hazard ratio (CSHR),
The subdistribution hazard can be represented as a function of the cause-specific hazard, \(\lambda _1(t) = \alpha _1(t)/\big ( 1 + \frac{F_2 (t)}{S(t)} \big )\), where \(F_k(t) = \int _0^t S(u) \alpha _k(u) du = 1 - \exp (-\int _0^t \lambda _k (u) du)\). The subdistribution hazard ratio (SHR) is then
The SHR is not constant, but equal to the CSHR multiplied by a function of time.
Supplemental tables and figures
Rights and permissions
About this article
Cite this article
Conner, S.C., Beiser, A., Benjamin, E.J. et al. A comparison of statistical methods to predict the residual lifetime risk. Eur J Epidemiol 37, 173–194 (2022). https://doi.org/10.1007/s10654-021-00815-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-021-00815-8