A comparison of statistical methods to predict the residual lifetime risk

Conner, Sarah C.; Beiser, Alexa; Benjamin, Emelia J.; LaValley, Michael P.; Larson, Martin G.; Trinquart, Ludovic

doi:10.1007/s10654-021-00815-8

A comparison of statistical methods to predict the residual lifetime risk

METHODS
Published: 03 January 2022

Volume 37, pages 173–194, (2022)
Cite this article

European Journal of Epidemiology Aims and scope Submit manuscript

Sarah C. Conner ORCID: orcid.org/0000-0002-0929-9948¹,
Alexa Beiser^1,2,3,
Emelia J. Benjamin^2,4,5,
Michael P. LaValley¹,
Martin G. Larson^1,2 &
…
Ludovic Trinquart^1,6,7

1248 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Lifetime risk measures the cumulative risk for developing a disease over one’s lifespan. Modeling the lifetime risk must account for left truncation, the competing risk of death, and inference at a fixed age. In addition, statistical methods to predict the lifetime risk should account for covariate-outcome associations that change with age. In this paper, we review and compare statistical methods to predict the lifetime risk. We first consider a generalized linear model for the lifetime risk using pseudo-observations of the Aalen-Johansen estimator at a fixed age, allowing for left truncation. We also consider modeling the subdistribution hazard with Fine-Gray and Royston-Parmar flexible parametric models in left truncated data with time-covariate interactions, and using these models to predict lifetime risk. In simulation studies, we found the pseudo-observation approach had the least bias, particularly in settings with crossing or converging cumulative incidence curves. We illustrate our method by modeling the lifetime risk of atrial fibrillation in the Framingham Heart Study. We provide technical guidance to replicate all analyses in R.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Generalized mean residual life models for case-cohort and nested case-control studies

Article 11 June 2020

Peng Jin, Anne Zeleniuch-Jacquotte & Mengling Liu

Revisiting methods for modeling longitudinal and survival data: Framingham Heart Study

Article Open access 10 February 2021

Julius S. Ngwa, Howard J. Cabral, … L. Adrienne Cupples

An Assessment and Extension of the Mechanism-Based Approach to the Identification of Age-Period-Cohort Models

Article Open access 09 March 2017

Maarten J. Bijlsma, Rhian M. Daniel, … Bianca L. De Stavola

Availability of data and material

To protect the confidentiality of the Framingham Heart Study participants, the data from our illustrative examples are not on our GitHub page. Participant level data from the Framingham Heart Study are available at the database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) and BioLINCC (https://biolincc.nhlbi.nih.gov/home/).

References

Karmali KN, Lloyd-Jones DM. Adding a life-course perspective to cardiovascular-risk communication. Nat Rev Cardiol. 2013;10(2):111.
Article PubMed Google Scholar
Seshadri S, Wolf PA. Lifetime risk of stroke and dementia: current concepts, and estimates from the Framingham study. Lancet Neurol. 2007;6(12):1106–14.
Article PubMed Google Scholar
Beiser A, D’Agostino RB Sr, Seshadri S, et al. Computing estimates of incidence, including lifetime risk: Alzheimer’s disease in the framingham study. the practical incidence estimators (pie) macro. Stat Med. 2000;19(11–12):1495–522.
Article CAS PubMed Google Scholar
Gaynor JJ, Feuer EJ, Tan CC, et al. On the use of cause-specific failure and conditional failure probabilities: examples from clinical oncology data. J Am Stat Assoc. 1993;88(422):400–9.
Article Google Scholar
Brookmeyer R, Abdalla N. Multistate models and lifetime risk estimation: application to alzheimer’s disease. Stat Med. 2019;38(9):1558–65.
Article PubMed Google Scholar
Dinse GE, Larson MG. A note on semi-markov models for partially censored data. Biometrika. 1986;73(2):379–86.
Article Google Scholar
Carone M, Asgharian M, Jewell NP. Estimating the lifetime risk of dementia in the canadian elderly population using cross-sectional cohort survival data. J Am Stat Assoc. 2014;109(505):24–35.
Article CAS PubMed PubMed Central Google Scholar
Alonso A, Krijthe BP, Aspelund T, Stepas KA, Pencina MJ, Moser CB, Sinner MF, Sotoodehnia N, Fontes JD, Janssens AC, Kronmal RA. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J Am Heart Assoc. 2013;2(2):e000102.
Article PubMed PubMed Central Google Scholar
Staerk L, Wang B, Preis SR, et al. Lifetime risk of atrial fibrillation according to optimal, borderline, or elevated levels of risk factors: cohort study based on longitudinal data from the framingham heart study. BMJ. 2018;361:k1453.
Article PubMed PubMed Central Google Scholar
Grand MK, Putter H, Allignol A, et al. A note on pseudo-observations and left-truncation. Biom J. 2019;61(2):290–8.
Article PubMed Google Scholar
Fine JP, Gray RJ. A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc. 1999;94(446):496–509.
Article Google Scholar
Geskus RB. Cause-specific cumulative incidence estimation and the fine and gray model under both left truncation and right censoring. Biometrics. 2011;67(1):39–49.
Article PubMed Google Scholar
Lambert PC, Wilkes SR, Crowther MJ. Flexible parametric modelling of the cause-specific cumulative incidence function. Stat Med. 2017;36(9):1429–46.
Article PubMed Google Scholar
Royston P, Parmar MK. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21(15):2175–97.
Article PubMed Google Scholar
Jeong JH, Fine J. A note on cause-specific residual life. Biometrika. 2009;96(1):237–42.
Article PubMed Central Google Scholar
Du Y. Measuring Effects of Risk Factors on Cumulative Incidence and Remaining Lifetime Risk in the Presence of Competing Risks. PhD Thesis, Boston University, 2010.
Andersen P, Borgan O, Gill R et al. Statistical models based on counting processes springer-verlag: New york. MR1198884 1993; .
Allignol A, Schumacher M, Beyersmann J. A note on variance estimation of the aalen-johansen estimator of the cumulative incidence function in competing risks, with a view towards left-truncated data. Biom J. 2010;52(1):126–37.
Article PubMed Google Scholar
Aalen OO, Johansen S. An empirical transition matrix for non-homogeneous markov chains based on censored observations. Scand J Stat 1978; pp 141–150.
Klein JP, Andersen PK. Regression modeling of competing risks data based on pseudovalues of the cumulative incidence function. Biometrics. 2005;61(1):223–9.
Article PubMed Google Scholar
Graw F, Gerds TA, Schumacher M. On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal. 2009;15(2):241–55.
Article PubMed Google Scholar
Andersen PK, Klein JP, Rosthøj S. Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika. 2003;90(1):15–27.
Article Google Scholar
Andersen PK, Pohar Perme M. Pseudo-observations in survival analysis. Stat Methods Med Res. 2010;19(1):71–99.
Article PubMed Google Scholar
Klein JP, Logan B, Harhoff M, et al. Analyzing survival curves at a fixed point in time. Stat Med. 2007;26(24):4505–19.
Article PubMed Google Scholar
Chen J, Hou Y, Chen Z. Statistical inference methods for cumulative incidence function curves at a fixed point in time. Commun Stat Simul Comput 2018; pp 1–16.
Overgaard M, Parner ET, Pedersen J, et al. Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Ann Stat. 2017;45(5):1988–2015.
Article Google Scholar
de Wreede LC, Fiocco M, Putter H, et al. mstate: an r package for the analysis of competing risks and multi-state models. J Stat Softw. 2011;38(7):1–30.
Article Google Scholar
Beyersmann J, Allignol A, Schumacher M. Competing risks and multistate models with R. Berlin: Springer Science & Business Media; 2011.
Google Scholar
Latouche A, Boisson V, Chevret S, et al. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(5):965–74.
Article CAS PubMed Google Scholar
Beyersmann J, Schumacher M. Misspecified regression model for the subdistribution hazard of a competing risk. Stat Med. 2007;26(7):1649.
Article PubMed Google Scholar
Muñoz A, Abraham AG, Matheson M et al. Non-proportionality of hazards in the competing risks framework. In Risk Assessment and Evaluation of Predictions. Springer, 2013; pp 3–22.
Thomas L, Reyes EM. Tutorial: survival estimation for cox regression models with time-varying coefficients using sas and r. J Stat Softw. 2014;61(c1):1–23.
Google Scholar
Beyersmann J, Latouche A, Buchholz A, et al. Simulating competing risks data in survival analysis. Stat Med. 2009;28(6):956–71.
Article PubMed Google Scholar
Zhou B, Fine J, Laird G. Goodness-of-fit test for proportional subdistribution hazards model. Stat Med. 2013;32(22):3804–11.
Article PubMed Google Scholar
Li J, Scheike TH, Zhang MJ. Checking fine and gray subdistribution hazards model with cumulative sums of residuals. Lifetime Data Anal. 2015;21(2):197–217.
Article PubMed Google Scholar
Halekoh U, Højsgaard S, Yan J, et al. The r package geepack for generalized estimating equations. J Stat Softw. 2006;15(2):1–11.
Article Google Scholar
Therneau TM, Package Lumley T. Survival. R Top Doc. 2015;128(10):28–33.
Google Scholar
Liu XR, Pawitan Y, Clements MS. Generalized survival models for correlated time-to-event data. Stat Med. 2017;36(29):4743–62.
Article PubMed Google Scholar
Pan W. Akaike‘s information criterion in generalized estimating equations. Biometrics. 2001;57(1):120–5.
Article CAS PubMed Google Scholar
Rücker G, Schwarzer G. Presenting simulation results in a nested loop plot. BMC Med Res Methodol. 2014;14(1):129.
Article PubMed PubMed Central Google Scholar
Kannel WB, Dawber TR, Kagan A, et al. Factors of risk in the development of coronary heart disease–six-year follow-up experience: the framingham study. Ann Intern Med. 1961;55(1):33–50.
Article CAS PubMed Google Scholar
Feinleib M, Kannel WB, Garrison RJ, et al. The framingham offspring study. design and preliminary data. Prev Med. 1975;4(4):518–25.
Article CAS PubMed Google Scholar
Sauerbrei W, Royston P, Look M. A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. Biom J. 2007;49(3):453–73.
Article PubMed Google Scholar
Austin PC, Latouche A, Fine JP. A review of the use of time-varying covariates in the fine-gray subdistribution hazard competing risk regression model. Stat Med. 2020;39(2):103–13.
Article PubMed Google Scholar
Austin PC, Lee DS, Fine JP. Introduction to the analysis of survival data in the presence of competing risks. Circulation. 2016;133(6):601–9.
Article PubMed PubMed Central Google Scholar
Hinchliffe SR, Lambert PC. Flexible parametric modelling of cause-specific hazards to estimate cumulative incidence functions. BMC Med Res Methodol. 2013;13(1):13.
Article PubMed PubMed Central Google Scholar
Mozumder SI, Rutherford M, Lambert P. Direct likelihood inference on the cause-specific cumulative incidence function: a flexible parametric regression modelling approach. Stat Med. 2018;37(1):82–97.
Article PubMed Google Scholar
Jacobsen M, Martinussen T. A note on the large sample properties of estimators based on generalized linear models for correlated pseudo-observations. Scand J Stat. 2016;43(3):845–62.
Article Google Scholar
Overgaard M, Parner ET, Pedersen J. Estimating the variance in a pseudo-observation scheme with competing risks. Scand J Stat. 2018;45(4):923–40.
Article Google Scholar
Stegherr R, Allignol A, Meister R, et al. Estimating cumulative incidence functions in competing risks data with dependent left-truncation. Stat Med. 2020;39(4):481–93.
Article PubMed Google Scholar
Pencina MJ, Larson MG, D’Agostino RB. Choice of time scale and its effect on significance of predictors in longitudinal studies. Stat Med. 2007;26(6):1343–59.
Article PubMed Google Scholar
Latouche A, Allignol A, Beyersmann J, Labopin M, Fine JP. A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions. J Clin Epidemiol. 2013;66(6):648–53.
Article PubMed Google Scholar
Binder N, Gerds TA, Andersen PK. Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal. 2014;20(2):303–15.
Article PubMed Google Scholar
Bower H, Crowther MJ, Rutherford MJ, Andersson TM, Clements M, Liu XR, Dickman PW. Lambert PC. Capturing simple and complex time-dependent effects using flexible parametric survival models: a simulation study. Commun Stat Simul Comput; 2019. pp. 1–7.
Rutherford MJ, Crowther MJ, Lambert PC. The use of restricted cubic splines to approximate complex hazard functions in the analysis of time-to-event data: a simulation study. J Stat Comput Simul. 2015;85(4):777–93.
Article Google Scholar
Geskus RB. On the inclusion of prevalent cases in HIV/AIDS natural history studies through a marker-based estimate of time since seroconversion. Stat Med. 2000;19(13):1753–69.
Article CAS PubMed Google Scholar
Burnham KP, Anderson DR. Multimodel inference: understanding AIC and BIC in model selection. Sociol Methods Res. 2004;33(2):261–304.
Article Google Scholar
Muller CJ, MacLehose RF. Estimating predicted probabilities from logistic regression: different methods correspond to different target populations. Int J Epidemiol. 2014;43(3):962–70.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank Dr. Sarwar Mozumder and Dr. Paul Lambert for their support in implementing the flexible parametric model approach, and Katia Bulekova for her support with Boston University’s Shared Computing Cluster. The authors also thank the anonymous reviewers and Associate Editor for their thoughtful and helpful comments.

Funding

SCC received funding from the National Institute of General Medical Sciences (NIGMS): T32 GM74905-14 and the National Heart, Lung, and Blood Institute (NHLBI): F31 HL145904-01. EJB received funding from NHLBI: R01HL128914; 2R01 HL092577; 2U54HL120163; American Heart Association (AHA): 18SFRN34110082. LT received funding from AHA: 18SFRN34150007. The Framingham Heart Study is supported by NHLBI (N01-HC25195, HHSN268201500001I; 75N92019D00031) and Boston University School of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
Sarah C. Conner, Alexa Beiser, Michael P. LaValley, Martin G. Larson & Ludovic Trinquart
Framingham Heart Study, Framingham, MA, USA
Alexa Beiser, Emelia J. Benjamin & Martin G. Larson
Department of Neurology, Boston University School of Medicine, Boston, MA, USA
Alexa Beiser
Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA
Emelia J. Benjamin
Section of Cardiovascular Medicine, Boston University School of Medicine, Boston, MA, USA
Emelia J. Benjamin
Tufts Clinical and Translational Science Institute, Tufts University, Boston, MA, USA
Ludovic Trinquart
Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
Ludovic Trinquart

Authors

Sarah C. Conner
View author publications
You can also search for this author in PubMed Google Scholar
Alexa Beiser
View author publications
You can also search for this author in PubMed Google Scholar
Emelia J. Benjamin
View author publications
You can also search for this author in PubMed Google Scholar
Michael P. LaValley
View author publications
You can also search for this author in PubMed Google Scholar
Martin G. Larson
View author publications
You can also search for this author in PubMed Google Scholar
Ludovic Trinquart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Sarah C. Conner or Ludovic Trinquart.

Ethics declarations

Conflicts of interest

The authors report no conflicts of interest.

Code availability

We provide R code to apply our methods with working examples and replicate our simulation studies on GitHub at https://github.com/s-conner/lifetimerisk.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Variance of non-parametric estimator of lifetime risk

The variance of the lifetime risk is estimated using Greenwood’s formula, [4, 18]

$$\begin{aligned}&\widehat{\mathrm {var}} \left\{ \hat{F_1}(\tau | T>\tau _0) \right\} \\&\quad = \sum _{\tau _0< t_j \le \tau } \left\{ {\hat{S}}(t_{j-1}) \frac{\varDelta N_1(t_j)}{Y(t_j)} \right\} ^2 \\&\quad \quad \times \left\{ \frac{Y(t_j) - \varDelta N_1(t_j)}{Y(t_j) \varDelta N_1(t_j)} + \sum _{t_l< t_{t_{j}}} \frac{\varDelta N(t_l)}{Y(t_l) (Y(t_l) - \varDelta N(t_l))} \right\} \\&\qquad + 2 \sum _{\tau _0< t_j< \tau } \sum _{ t_k \in {(}t_j, \tau {]}} {\hat{S}}(t_{j-1}) \frac{\varDelta N_1(t_j)}{Y(t_j)} {\hat{S}}(t_{k-1}) \\&\quad \quad \times \frac{\varDelta N_1(t_k)}{Y(t_k)} \left\{ -\frac{1}{Y(t_j)} + \sum _{t_l<t_j} \frac{\varDelta N(t_l)}{Y(t_l) (Y(t_l) - \varDelta N(t_l))} \right\} . \end{aligned}$$

Weighted Breslow estimator of the baseline cumulative subdistribution hazard

The weighted Breslow estimator of the baseline cumulative subdistribution hazard is given by

$$\begin{aligned} {\hat{\varLambda }}_{1,0}(t | T_i>\tau _0)&= \int _{\tau _0}^{\tau } {\hat{\lambda }}_{1,0}(t | T_i>\tau _0) dt \\&= \frac{1}{n} \sum _{i=1}^n \int _{\tau _0}^{\tau } \frac{w_i (u)}{{\hat{S}}^0(\hat{\varvec{\pi }}, u)} dN_i(u) \end{aligned}$$

where ${\hat{S}}^0(\hat{\varvec{\pi }}, u) = \sum _{i=1}^n w_i(u) T_i(u) \exp \{ \hat{\varvec{\pi }}^T Z_i(u) \}$ [11].

Illustration of non-proportional subdistribution hazards with proportional cause-specific hazards

Let the cause-specific hazards take Weibull form, $\alpha _1(t; Z)=\frac{a_1}{b_1^{a_1}}t^{a_1 - 1} e^{\gamma _1 Z}$ and $\alpha _2(t; Z)=\frac{a_2}{b_2^{a_2}}t^{a_2 - 1} e^{\gamma _2 Z}$, where Z is a binary covariate with a proportional effect on both cause-specific hazards. The all-cause cumulative hazard is $A(t; Z)=\frac{t}{b_1}^{a_1} e^{\gamma _1 Z} + \frac{t}{b_2}^{a_2} e^{\gamma _1 Z}$. The proportional effect of Z on the cause-specific hazard can be verified by deriving the cause-specific hazard ratio (CSHR),

$$\begin{aligned} \frac{\alpha _1(t;Z=1)}{\alpha _1(t;Z=0)}&= \frac{\frac{a_1}{b_1^{a_1}}t^{a_1 - 1} e^{\gamma _1}}{\frac{a_1}{b_1^{a_1}}t^{a_1 - 1}}\\&=e^{\gamma _1}. \end{aligned}$$

The subdistribution hazard can be represented as a function of the cause-specific hazard, $\lambda _1(t) = \alpha _1(t)/\big ( 1 + \frac{F_2 (t)}{S(t)} \big )$, where $F_k(t) = \int _0^t S(u) \alpha _k(u) du = 1 - \exp (-\int _0^t \lambda _k (u) du)$. The subdistribution hazard ratio (SHR) is then

$$\begin{aligned} \frac{\lambda _1(t; Z=1)}{\lambda _1(t; Z=0)}&= \frac{ \alpha _1(t;Z=1) / \left( 1 + \frac{F_2 (t;Z=1)}{S(t; Z=1)} \right) }{ \alpha _1(t;Z=0) / \left( 1 + \frac{F_2 (t; Z=0)}{S(t; Z=0)} \right) } \\&= e^{\gamma _1} \cdot \left( \frac{1 + \frac{F_2 (t; Z=0)}{S(t; Z=0)} }{ 1 + \frac{F_2 (t; Z=1)}{S(t; Z=1)}} \right) . \end{aligned}$$

The SHR is not constant, but equal to the CSHR multiplied by a function of time.

Supplemental tables and figures

Table 8 Participant characteristics at index age 55 in the Framingham Heart Study

Full size table

Table 9 Sensitivity analysis: predicted differences in lifetime risk of atrial fibrillation at age 95, from index age 55 using BIC

Full size table

Table 10 Multivariable models for incident atrial fibrillation in the Framingham Heart Study, from index age 55 to age 95

Full size table

Table 11 Multivariable models for incident death without atrial fibrillation in the Framingham Heart Study, from index age 55 to age 95

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Conner, S.C., Beiser, A., Benjamin, E.J. et al. A comparison of statistical methods to predict the residual lifetime risk. Eur J Epidemiol 37, 173–194 (2022). https://doi.org/10.1007/s10654-021-00815-8

Download citation

Received: 23 February 2021
Accepted: 13 October 2021
Published: 03 January 2022
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10654-021-00815-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A comparison of statistical methods to predict the residual lifetime risk

Abstract

Access this article

Similar content being viewed by others

Generalized mean residual life models for case-cohort and nested case-control studies

Revisiting methods for modeling longitudinal and survival data: Framingham Heart Study

An Assessment and Extension of the Mechanism-Based Approach to the Identification of Age-Period-Cohort Models

Availability of data and material

References

Acknowledgements

Funding