Skip to main content

Advertisement

Log in

A comparison of statistical methods to predict the residual lifetime risk

  • METHODS
  • Published:
European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

Lifetime risk measures the cumulative risk for developing a disease over one’s lifespan. Modeling the lifetime risk must account for left truncation, the competing risk of death, and inference at a fixed age. In addition, statistical methods to predict the lifetime risk should account for covariate-outcome associations that change with age. In this paper, we review and compare statistical methods to predict the lifetime risk. We first consider a generalized linear model for the lifetime risk using pseudo-observations of the Aalen-Johansen estimator at a fixed age, allowing for left truncation. We also consider modeling the subdistribution hazard with Fine-Gray and Royston-Parmar flexible parametric models in left truncated data with time-covariate interactions, and using these models to predict lifetime risk. In simulation studies, we found the pseudo-observation approach had the least bias, particularly in settings with crossing or converging cumulative incidence curves. We illustrate our method by modeling the lifetime risk of atrial fibrillation in the Framingham Heart Study. We provide technical guidance to replicate all analyses in R.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data and material

To protect the confidentiality of the Framingham Heart Study participants, the data from our illustrative examples are not on our GitHub page. Participant level data from the Framingham Heart Study are available at the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) and BioLINCC (https://biolincc.nhlbi.nih.gov/home/).

References

  1. Karmali KN, Lloyd-Jones DM. Adding a life-course perspective to cardiovascular-risk communication. Nat Rev Cardiol. 2013;10(2):111.

    Article  PubMed  Google Scholar 

  2. Seshadri S, Wolf PA. Lifetime risk of stroke and dementia: current concepts, and estimates from the Framingham study. Lancet Neurol. 2007;6(12):1106–14.

    Article  PubMed  Google Scholar 

  3. Beiser A, D’Agostino RB Sr, Seshadri S, et al. Computing estimates of incidence, including lifetime risk: Alzheimer’s disease in the framingham study. the practical incidence estimators (pie) macro. Stat Med. 2000;19(11–12):1495–522.

    Article  CAS  PubMed  Google Scholar 

  4. Gaynor JJ, Feuer EJ, Tan CC, et al. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. J Am Stat Assoc. 1993;88(422):400–9.

    Article  Google Scholar 

  5. Brookmeyer R, Abdalla N. Multistate models and lifetime risk estimation: application to alzheimer’s disease. Stat Med. 2019;38(9):1558–65.

    Article  PubMed  Google Scholar 

  6. Dinse GE, Larson MG. A note on semi-markov models for partially censored data. Biometrika. 1986;73(2):379–86.

    Article  Google Scholar 

  7. Carone M, Asgharian M, Jewell NP. Estimating the lifetime risk of dementia in the canadian elderly population using cross-sectional cohort survival data. J Am Stat Assoc. 2014;109(505):24–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Alonso A, Krijthe BP, Aspelund T, Stepas KA, Pencina MJ, Moser CB, Sinner MF, Sotoodehnia N, Fontes JD, Janssens AC, Kronmal RA. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J Am Heart Assoc. 2013;2(2):e000102.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Staerk L, Wang B, Preis SR, et al. Lifetime risk of atrial fibrillation according to optimal, borderline, or elevated levels of risk factors: cohort study based on longitudinal data from the framingham heart study. BMJ. 2018;361:k1453.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Grand MK, Putter H, Allignol A, et al. A note on pseudo-observations and left-truncation. Biom J. 2019;61(2):290–8.

    Article  PubMed  Google Scholar 

  11. Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509.

    Article  Google Scholar 

  12. Geskus RB. Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring. Biometrics. 2011;67(1):39–49.

    Article  PubMed  Google Scholar 

  13. Lambert PC, Wilkes SR, Crowther MJ. Flexible parametric modelling of the cause-specific cumulative incidence function. Stat Med. 2017;36(9):1429–46.

    Article  PubMed  Google Scholar 

  14. Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175–97.

    Article  PubMed  Google Scholar 

  15. Jeong JH, Fine J. A note on cause-specific residual life. Biometrika. 2009;96(1):237–42.

    Article  PubMed Central  Google Scholar 

  16. Du Y. Measuring Effects of Risk Factors on Cumulative Incidence and Remaining Lifetime Risk in the Presence of Competing Risks. PhD Thesis, Boston University, 2010.

  17. Andersen P, Borgan O, Gill R et al. Statistical models based on counting processes springer-verlag: New york. MR1198884 1993; .

  18. Allignol A, Schumacher M, Beyersmann J. A note on variance estimation of the aalen-johansen estimator of the cumulative incidence function in competing risks, with a view towards left-truncated data. Biom J. 2010;52(1):126–37.

    Article  PubMed  Google Scholar 

  19. Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous markov chains based on censored observations. Scand J Stat 1978; pp 141–150.

  20. Klein JP, Andersen PK. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61(1):223–9.

    Article  PubMed  Google Scholar 

  21. Graw F, Gerds TA, Schumacher M. On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009;15(2):241–55.

    Article  PubMed  Google Scholar 

  22. Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90(1):15–27.

    Article  Google Scholar 

  23. Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71–99.

    Article  PubMed  Google Scholar 

  24. Klein JP, Logan B, Harhoff M, et al. Analyzing survival curves at a fixed point in time. Stat Med. 2007;26(24):4505–19.

    Article  PubMed  Google Scholar 

  25. Chen J, Hou Y, Chen Z. Statistical inference methods for cumulative incidence function curves at a fixed point in time. Commun Stat Simul Comput 2018; pp 1–16.

  26. Overgaard M, Parner ET, Pedersen J, et al. Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Ann Stat. 2017;45(5):1988–2015.

    Article  Google Scholar 

  27. de Wreede LC, Fiocco M, Putter H, et al. mstate: an r package for the analysis of competing risks and multi-state models. J Stat Softw. 2011;38(7):1–30.

    Article  Google Scholar 

  28. Beyersmann J, Allignol A, Schumacher M. Competing risks and multistate models with R. Berlin: Springer Science & Business Media; 2011.

    Google Scholar 

  29. Latouche A, Boisson V, Chevret S, et al. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(5):965–74.

    Article  CAS  PubMed  Google Scholar 

  30. Beyersmann J, Schumacher M. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(7):1649.

    Article  PubMed  Google Scholar 

  31. Muñoz A, Abraham AG, Matheson M et al. Non-proportionality of hazards in the competing risks framework. In Risk Assessment and Evaluation of Predictions. Springer, 2013; pp 3–22.

  32. Thomas L, Reyes EM. Tutorial: survival estimation for cox regression models with time-varying coefficients using sas and r. J Stat Softw. 2014;61(c1):1–23.

    Google Scholar 

  33. Beyersmann J, Latouche A, Buchholz A, et al. Simulating competing risks data in survival analysis. Stat Med. 2009;28(6):956–71.

    Article  PubMed  Google Scholar 

  34. Zhou B, Fine J, Laird G. Goodness-of-fit test for proportional subdistribution hazards model. Stat Med. 2013;32(22):3804–11.

    Article  PubMed  Google Scholar 

  35. Li J, Scheike TH, Zhang MJ. Checking fine and gray subdistribution hazards model with cumulative sums of residuals. Lifetime Data Anal. 2015;21(2):197–217.

    Article  PubMed  Google Scholar 

  36. Halekoh U, Højsgaard S, Yan J, et al. The r package geepack for generalized estimating equations. J Stat Softw. 2006;15(2):1–11.

    Article  Google Scholar 

  37. Therneau TM, Package Lumley T. Survival. R Top Doc. 2015;128(10):28–33.

    Google Scholar 

  38. Liu XR, Pawitan Y, Clements MS. Generalized survival models for correlated time-to-event data. Stat Med. 2017;36(29):4743–62.

    Article  PubMed  Google Scholar 

  39. Pan W. Akaike‘s information criterion in generalized estimating equations. Biometrics. 2001;57(1):120–5.

    Article  CAS  PubMed  Google Scholar 

  40. Rücker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol. 2014;14(1):129.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Kannel WB, Dawber TR, Kagan A, et al. Factors of risk in the development of coronary heart disease–six-year follow-up experience: the framingham study. Ann Intern Med. 1961;55(1):33–50.

    Article  CAS  PubMed  Google Scholar 

  42. Feinleib M, Kannel WB, Garrison RJ, et al. The framingham offspring study. design and preliminary data. Prev Med. 1975;4(4):518–25.

    Article  CAS  PubMed  Google Scholar 

  43. Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–73.

    Article  PubMed  Google Scholar 

  44. Austin PC, Latouche A, Fine JP. A review of the use of time-varying covariates in the fine-gray subdistribution hazard competing risk regression model. Stat Med. 2020;39(2):103–13.

    Article  PubMed  Google Scholar 

  45. Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133(6):601–9.

    Article  PubMed  PubMed Central  Google Scholar 

  46. Hinchliffe SR, Lambert PC. Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions. BMC Med Res Methodol. 2013;13(1):13.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Mozumder SI, Rutherford M, Lambert P. Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach. Stat Med. 2018;37(1):82–97.

    Article  PubMed  Google Scholar 

  48. Jacobsen M, Martinussen T. A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scand J Stat. 2016;43(3):845–62.

    Article  Google Scholar 

  49. Overgaard M, Parner ET, Pedersen J. Estimating the variance in a pseudo-observation scheme with competing risks. Scand J Stat. 2018;45(4):923–40.

    Article  Google Scholar 

  50. Stegherr R, Allignol A, Meister R, et al. Estimating cumulative incidence functions in competing risks data with dependent left-truncation. Stat Med. 2020;39(4):481–93.

    Article  PubMed  Google Scholar 

  51. Pencina MJ, Larson MG, D’Agostino RB. Choice of time scale and its effect on significance of predictors in longitudinal studies. Stat Med. 2007;26(6):1343–59.

    Article  PubMed  Google Scholar 

  52. Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions. J Clin Epidemiol. 2013;66(6):648–53.

    Article  PubMed  Google Scholar 

  53. Binder N, Gerds TA, Andersen PK. Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal. 2014;20(2):303–15.

    Article  PubMed  Google Scholar 

  54. Bower H, Crowther MJ, Rutherford MJ, Andersson TM, Clements M, Liu XR, Dickman PW. Lambert PC. Capturing simple and complex time-dependent effects using flexible parametric survival models: a simulation study. Commun Stat Simul Comput; 2019. pp. 1–7.

  55. Rutherford MJ, Crowther MJ, Lambert PC. The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study. J Stat Comput Simul. 2015;85(4):777–93.

    Article  Google Scholar 

  56. Geskus RB. On the inclusion of prevalent cases in HIV/AIDS natural history studies through a marker-based estimate of time since seroconversion. Stat Med. 2000;19(13):1753–69.

    Article  CAS  PubMed  Google Scholar 

  57. Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res. 2004;33(2):261–304.

    Article  Google Scholar 

  58. Muller CJ, MacLehose RF. Estimating predicted probabilities from logistic regression: different methods correspond to different target populations. Int J Epidemiol. 2014;43(3):962–70.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank Dr. Sarwar Mozumder and Dr. Paul Lambert for their support in implementing the flexible parametric model approach, and Katia Bulekova for her support with Boston University’s Shared Computing Cluster. The authors also thank the anonymous reviewers and Associate Editor for their thoughtful and helpful comments.

Funding

SCC received funding from the National Institute of General Medical Sciences (NIGMS): T32 GM74905-14 and the National Heart, Lung, and Blood Institute (NHLBI): F31 HL145904-01. EJB received funding from NHLBI: R01HL128914; 2R01 HL092577; 2U54HL120163; American Heart Association (AHA): 18SFRN34110082. LT received funding from AHA: 18SFRN34150007. The Framingham Heart Study is supported by NHLBI (N01-HC25195, HHSN268201500001I; 75N92019D00031) and Boston University School of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sarah C. Conner or Ludovic Trinquart.

Ethics declarations

Conflicts of interest

The authors report no conflicts of interest.

Code availability

We provide R code to apply our methods with working examples and replicate our simulation studies on GitHub at https://github.com/s-conner/lifetimerisk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Variance of non-parametric estimator of lifetime risk

The variance of the lifetime risk is estimated using Greenwood’s formula, [4, 18]

$$\begin{aligned}&\widehat{\mathrm {var}} \left\{ \hat{F_1}(\tau | T>\tau _0) \right\} \\&\quad = \sum _{\tau _0< t_j \le \tau } \left\{ {\hat{S}}(t_{j-1}) \frac{\varDelta N_1(t_j)}{Y(t_j)} \right\} ^2 \\&\quad \quad \times \left\{ \frac{Y(t_j) - \varDelta N_1(t_j)}{Y(t_j) \varDelta N_1(t_j)} + \sum _{t_l< t_{t_{j}}} \frac{\varDelta N(t_l)}{Y(t_l) (Y(t_l) - \varDelta N(t_l))} \right\} \\&\qquad + 2 \sum _{\tau _0< t_j< \tau } \sum _{ t_k \in {(}t_j, \tau {]}} {\hat{S}}(t_{j-1}) \frac{\varDelta N_1(t_j)}{Y(t_j)} {\hat{S}}(t_{k-1}) \\&\quad \quad \times \frac{\varDelta N_1(t_k)}{Y(t_k)} \left\{ -\frac{1}{Y(t_j)} + \sum _{t_l<t_j} \frac{\varDelta N(t_l)}{Y(t_l) (Y(t_l) - \varDelta N(t_l))} \right\} . \end{aligned}$$

Weighted Breslow estimator of the baseline cumulative subdistribution hazard

The weighted Breslow estimator of the baseline cumulative subdistribution hazard is given by

$$\begin{aligned} {\hat{\varLambda }}_{1,0}(t | T_i>\tau _0)&= \int _{\tau _0}^{\tau } {\hat{\lambda }}_{1,0}(t | T_i>\tau _0) dt \\&= \frac{1}{n} \sum _{i=1}^n \int _{\tau _0}^{\tau } \frac{w_i (u)}{{\hat{S}}^0(\hat{\varvec{\pi }}, u)} dN_i(u) \end{aligned}$$

where \({\hat{S}}^0(\hat{\varvec{\pi }}, u) = \sum _{i=1}^n w_i(u) T_i(u) \exp \{ \hat{\varvec{\pi }}^T Z_i(u) \}\) [11].

Illustration of non-proportional subdistribution hazards with proportional cause-specific hazards

Let the cause-specific hazards take Weibull form, \(\alpha _1(t; Z)=\frac{a_1}{b_1^{a_1}}t^{a_1 - 1} e^{\gamma _1 Z}\) and \(\alpha _2(t; Z)=\frac{a_2}{b_2^{a_2}}t^{a_2 - 1} e^{\gamma _2 Z}\), where Z is a binary covariate with a proportional effect on both cause-specific hazards. The all-cause cumulative hazard is \(A(t; Z)=\frac{t}{b_1}^{a_1} e^{\gamma _1 Z} + \frac{t}{b_2}^{a_2} e^{\gamma _1 Z}\). The proportional effect of Z on the cause-specific hazard can be verified by deriving the cause-specific hazard ratio (CSHR),

$$\begin{aligned} \frac{\alpha _1(t;Z=1)}{\alpha _1(t;Z=0)}&= \frac{\frac{a_1}{b_1^{a_1}}t^{a_1 - 1} e^{\gamma _1}}{\frac{a_1}{b_1^{a_1}}t^{a_1 - 1}}\\&=e^{\gamma _1}. \end{aligned}$$

The subdistribution hazard can be represented as a function of the cause-specific hazard, \(\lambda _1(t) = \alpha _1(t)/\big ( 1 + \frac{F_2 (t)}{S(t)} \big )\), where \(F_k(t) = \int _0^t S(u) \alpha _k(u) du = 1 - \exp (-\int _0^t \lambda _k (u) du)\). The subdistribution hazard ratio (SHR) is then

$$\begin{aligned} \frac{\lambda _1(t; Z=1)}{\lambda _1(t; Z=0)}&= \frac{ \alpha _1(t;Z=1) / \left( 1 + \frac{F_2 (t;Z=1)}{S(t; Z=1)} \right) }{ \alpha _1(t;Z=0) / \left( 1 + \frac{F_2 (t; Z=0)}{S(t; Z=0)} \right) } \\&= e^{\gamma _1} \cdot \left( \frac{1 + \frac{F_2 (t; Z=0)}{S(t; Z=0)} }{ 1 + \frac{F_2 (t; Z=1)}{S(t; Z=1)}} \right) . \end{aligned}$$

The SHR is not constant, but equal to the CSHR multiplied by a function of time.

Supplemental tables and figures

Table 8 Participant characteristics at index age 55 in the Framingham Heart Study
Table 9 Sensitivity analysis: predicted differences in lifetime risk of atrial fibrillation at age 95, from index age 55 using BIC
Table 10 Multivariable models for incident atrial fibrillation in the Framingham Heart Study, from index age 55 to age 95
Table 11 Multivariable models for incident death without atrial fibrillation in the Framingham Heart Study, from index age 55 to age 95
Fig. 4
figure 4

True cumulative incidence functions in simulation study

Fig. 5
figure 5

Nested loop plot showing the simulation study results: relative bias

Fig. 6
figure 6

Nested loop plot showing the simulation study results: coverage

Fig. 7
figure 7

Nested loop plot showing the simulation study results: type I error and power of pseudo-observation method

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Conner, S.C., Beiser, A., Benjamin, E.J. et al. A comparison of statistical methods to predict the residual lifetime risk. Eur J Epidemiol 37, 173–194 (2022). https://doi.org/10.1007/s10654-021-00815-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-021-00815-8

Keywords

Navigation