Scientific Reports

Article
Open access
Published: 31 January 2024

Incorporating longitudinal history of risk factors into atherosclerotic cardiovascular disease risk prediction using deep learning

Jingzhi Yu¹,
Xiaoyun Yang¹,
Yu Deng¹,
Amy E. Krefman¹,
Lindsay R. Pool¹,
Lihui Zhao¹,
Xinlei Mi¹,
Hongyan Ning¹,
John Wilkins¹,
Donald M. Lloyd-Jones¹,
Lucia C. Petito¹ &
…
Norrina B. Allen¹

Scientific Reports volume 14, Article number: 2554 (2024) Cite this article

659 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

It is increasingly clear that longitudinal risk factor levels and trajectories are related to risk for atherosclerotic cardiovascular disease (ASCVD) above and beyond single measures. Currently used in clinical care, the Pooled Cohort Equations (PCE) are based on regression methods that predict ASCVD risk based on cross-sectional risk factor levels. Deep learning (DL) models have been developed to incorporate longitudinal data for risk prediction but its benefit for ASCVD risk prediction relative to the traditional Pooled Cohort Equations (PCE) remain unknown. Our study included 15,565 participants from four cardiovascular disease cohorts free of baseline ASCVD who were followed for adjudicated ASCVD. Ten-year ASCVD risk was calculated in the training set using our benchmark, the PCE, and a longitudinal DL model, Dynamic-DeepHit. Predictors included those incorporated in the PCE: sex, race, age, total cholesterol, high density lipid cholesterol, systolic and diastolic blood pressure, diabetes, hypertension treatment and smoking. The discrimination and calibration performance of the two models were evaluated in an overall hold-out testing dataset. Of the 15,565 participants in our dataset, 2170 (13.9%) developed ASCVD. The performance of the longitudinal DL model that incorporated 8 years of longitudinal risk factor data improved upon that of the PCE [AUROC: 0.815 (CI 0.782–0.844) vs 0.792 (CI 0.760–0.825)] and the net reclassification index was 0.385. The brier score for the DL model was 0.0514 compared with 0.0542 in the PCE. Incorporating longitudinal risk factors in ASCVD risk prediction using DL can improve model discrimination and calibration.

Similar content being viewed by others

Pre-existing and machine learning-based models for cardiovascular risk prediction

Article Open access 26 April 2021

Re-estimation improved the performance of two Framingham cardiovascular risk equations and the Pooled Cohort equations: A nationwide registry analysis

Article Open access 18 May 2020

A newly developed and externally validated non-clinical score accurately predicts 10-year cardiovascular disease risk in the general adult population

Article Open access 04 October 2021

Introduction

The Pooled Cohort Equations (PCE) were developed by the American College of Cardiology (ACC) and American Heart Association (AHA) in 2013 and updated in 2018 using data from 9 longitudinal cohort studies as a tool for clinicians to predict 10-year risk of atherosclerotic cardiovascular disease (ASCVD)^1,2. The PCE are a set of race- and sex-specific Cox proportional hazards models, that include widely-accepted clinical and behavioral risk factors for ASCVD, including age, sex, race, systolic (SBP) and diastolic blood pressure (DBP), total cholesterol, high density lipid-protein (HDL) cholesterol, smoking status, and type 2 diabetes. In clinical practice, risk predictions from the PCE are a key criterion to determine eligibility for moderate to high intensity statins and hypertension treatments^1,3. However, numerous studies have found the performance of the PCE varies across demographic groups^4,5,6; c-statistics from these studies ranged from 0.55 to 0.77 (average: 0.70) in men and 0.61 to 0.82 (average: 0.74) in women^7,8. Additionally, current clinical guidelines provide more ambivalent and complex treatment recommendations for those who fall in the borderline (5% to 7.5%) and intermediate risk groups (7.5% to 20%)⁹. A more accurate and robust risk prediction algorithm can help physicians better assess an individual’s risk, allowing them to make more appropriate treatment decisions.

A growing number of studies have demonstrated that long-term risk factor levels are associated with an individual’s risk for the development of ASCVD. For instance, incident CVD risk was shown to be dependent on cumulative exposure to LDL-C¹⁰. In a separate study, incident CVD and survival were also found to be associated with 10-year cumulative SBP¹¹. Hence, long-term risk factor patterns may be predictive of ASCVD risk above and beyond cross-sectional levels¹². In a prior study, after including 5-year and 10-year cumulative blood pressure measurements in the PCE, researchers found a moderate improvement in the net reclassification index¹³. Additionally, full integration of multiple longitudinal trajectories of clinical factors into ASCVD prediction is now feasible in clinical practice given advances in computing and electronic medical record (EMR) systems that allow clinicians to access longitudinal risk factor data for their patients.

In recent years, deep learning methods have been applied to many clinical predictive and classification problems to much success^14,15,16. Compared with traditional statistical methods, deep learning methods are often superior at processing and creating representations of complex data, such as radiology images and unstructured physician notes^17,18, without the need of prior feature engineering or selection^15,19. Hence, deep learning can more thoroughly extract and leverage the rich features stored in longitudinal data such as longitudinal blood pressure measurements recorded in the electronic health records (EHR) for predictive tasks.

In this study, we incorporated cross-sectional and longitudinal clinical and behavioral risk factor levels into a state-of-the-art deep learning architecture to create a new prediction model for 10-year risk of incident ASCVD in a pooled cohort of 4 US-based, diverse longitudinal cohorts. We evaluated our model’s predictive performance in comparison to that of the PCE in the overall population and in key population subgroups to better understand the importance of longitudinal data for ASCVD risk prediction. Moreover, we determined the importance of each clinical variable used in the prediction model. Lastly, we performed additional evaluations of the model performance in the borderline and intermediate risk groups to better understand our model’s potential impact on clinical decision making.

Methods

Study population

The four longitudinal cohorts used in this study contributed data to the Cardiovascular Lifetime Risk Pooling Project (LRPP): the Framingham Heart Study, Framingham Offspring Study, Coronary Artery Risk Development in Young Adults (CARDIA) Study, and Atherosclerosis Risk in Communities (ARIC) Study²⁰. These cohorts were selected for their number of participants, duration of follow-up, number of participant visits, and consistency of measurement of CVH risk factors.

As the examination schedules differed across cohorts, the number of exams within timeframes varied. To include the largest number of exams across the different studies while balancing the size of the timeframe for the study, we used 8 years of longitudinal data as the timeframe for CVD risk factor ascertainment (observation period). For consistency with the PCEs, outcomes were then measured over a 10-year follow-up period. Thus, to maximize the number of exams included in our study, we included data beginning at the following index exams (i.e. the exam at which risk factor follow-up began) for the included studies (Fig. 1): year 15 for the Framingham Heart Study, year 10 for the Framingham Offspring Study, year 18 for the CARDIA study, and year 1 for the ARIC study. The exact start and end years of each cohort as well as their mean and interquartile range of the number of exams in each cohort are shown in Table 1.

Figure 1

Table 1 The official start year, start year of the observation period (after adjustment), end year of the 8 year follow-up period, average number of exams within the 8 year follow-up period as well as the interquartile range of the number of exams by each cohort.

Full size table

Eligible participants were over 40 and under the age of 75 years at the point of prediction (i.e. the end of the 8 year observation period), had no record of self-report or diagnosed ASCVD at the index exam or during the 8 year observation period, and had at least one measurement of SBP, DBP, total cholesterol and HDL cholesterol. The LRPP is approved by the Northwestern IRB and this study utilized de-identified data from each of the included cohorts in LRPP. Written informed consent was obtained for all participants and analysis were performed in accordance with relevant guidelines.

Outcome: ASCVD incidence

The outcome in our study was ASCVD incidence, defined as the incidence of coronary heart disease, ischemic stroke, or CVD-related death, over a 10-year period that began at the end of the observation period (Fig. 1)^11,20. Coronary heart disease and ischemic stroke were adjudicated by review of medical records by study investigators²⁰. Participants without any recorded event at the end of the study, or who died of other causes during the follow-up period were considered right censored.

Features: CVD risk factors

CVD risk factors included in the original PCE include systolic BP, diastolic BP, total cholesterol, and HDL cholesterol, and were measured 1–4 times during the 8-year observation period. Blood pressure was measured using standard methods by clinic staff in the various cohorts^20,21. Fasting HDL-C, total cholesterol measurements and blood glucose were collected via blood serum^20,21. Diagnosis of diabetes and treatment for hypertension, predictors also included in the PCE, were self-reported at the index visit^20,21. Age, sex, race, ethnicity, smoking status (current/former smoker vs. never smoker), and alcohol consumption were self-reported at the index visit^20,21.

Statistical analysis

The deep learning model used in this study is Dynamic-DeepHit, which enabled the incorporation of longitudinal risk factor data in a dynamic fashion to estimate 10-year risk of incident ASCVD²². The Dynamic-DeepHit model has been demonstrated to have substantial improvements over traditional predictive methods, including the Cox Proportional Hazards Model, in predicting cystic fibrosis outcomes²².

The Dynamic-DeepHit model consists of two neural networks: (1) a recurrent neural network (RNN) that processes the longitudinal measurements and predicts future measurements of time-varying covariates, and (2) a fully connected neural network that estimates the probability of the specific event at a given time. RNNs are commonly used for machine learning problems involving temporal or sequential data and can capture long-term dependencies in the data. The Dynamic-DeepHit model also utilizes an attention mechanism that identifies important longitudinal measurements when making risk predictions, which improves predictive performance. The second neural network takes as input the learned representations that are output from the first neural network along with the last recorded set of behavioral and clinical covariates (e.g. the most recent CVD risk factor measurements at the end of the 8-year observation period). The output layer of the second neural network converts the learned relationships between the risk factors and outcome into the 10-year risk of incident CVD.

To explore the reasons for any improvements in the predictive power we also implemented a cross-sectional DeepHit model. This allowed us to disentangle whether the improvements were due to the incorporation of the longitudinal data or simply to the complexity of the neural network modeling methods. The DeepHit model was fitted on only the last set of measurements for each participant within the 8-year observation period. We also fit the traditional PCE model, to understand its performance in this sample.

Data pre-processing included randomly splitting the dataset into 3 parts, called training, tuning, and testing, at a 3:1:1 ratio. The Dynamic-DeepHit and cross-sectional DeepHit models were trained in the training dataset and corresponding hyperparameters were tuned in the tuning dataset. The training data for the PCE included both the training and tuning datasets. The testing dataset, not used in model development, was used for validation. The participants were the same in each of the respective datasets for each model.

We assessed model discrimination and calibration of all 3 models. We calculated and compared the Area Under the Receiver Operator Curve (AUROC) for all models to evaluate model discrimination, the ability of the model to discriminate those who have a higher risk of having an event from those at lower risk. Brier scores were used to evaluate the calibration of the model; lower scores indicate better calibration, the extent of the estimated risk correspond to observed event rates²³.

The trained Dynamic-DeepHit model was evaluated in the following population groups: Black males, Black females, other (White, Hispanic, Asian) males, other females, under 60 years old and 60 or over years old. These demographic groups were chosen to mirror the same classifications used for the sex- and race-specific PCE. As in the overall analysis, the AUROCs were compared between corresponding population subgroups.

To understand the importance of each predictor in the Dynamic-DeepHit model, we took a leave-one-out approach. We removed one predictor at a time from the Dynamic-DeepHit model and retrained and retested the model. The change in the testing dataset AUROC was calculated for each feature removed: the greater the change in AUROC, the greater the importance of the predictor. To also understand the role of longitudinal clinical risk factors better in the Dynamic-DeepHit model, we examined the average trajectories of SBP, DBP, total cholesterol and HDL for the individuals whose predicted risk increased and those whose risk decreased in the Dynamic-DeepHit model. Trajectories were created via generalized estimating equations (GEE) to account for correlation between repeated measurements for individuals. The trajectories were visualized across exam times with the 95% confidence bands.

Current blood pressure and cholesterol control guidelines use risk thresholds based on the PCE to inform clinical care. Physicians are advised to prescribe medium intensity statins if an individual’s ASCVD risk is over 7.5%. However, differentiation of individuals between the borderline and intermediate PCE risk groups could be improved. We calculated the net reclassification index (NRI) between the PCE and the Dynamic-DeepHit model, to understand how the Dynamic-DeepHit model changed individuals’ risk classification. We then conducted additional analysis to better understand the performance of the Dynamic-DeepHit model in borderline and intermediate groups, and how clinical behavior would be affected if the risk derived from the Dynamic-DeepHit model was used instead of risk from the PCE.

All statistical analysis was performed using Python version 3.8 and R 4.0.2. A 5% type-I error rate was used when calculating all confidence intervals.

Results

Baseline characteristics

Baseline demographics and measurements of CVD risk factors included in the PCE are described in Table 2. Pooled cohort participants included in this study were 55% female, 27% non-Hispanic Black and 50 years old on average. We found participants who developed ASCVD in prediction period had significantly higher levels of ASCVD risk factors compared with the participants who did not develop ASCVD. Baseline demographics and clinical characteristics of the participants by cohort is presented in Supplemental Table S1.

Table 2 Baseline demographics of LRPP analytic dataset. Risk factors in the ASCVD group were significantly higher than those in the Non-ASCVD group.

Full size table

Performance of models

Table 3 shows the discrimination of the three models in the training and testing datasets. The AUCs for the PCE and the longitudinal Dynamic-DeepHit model were 0.792 (CI 0.760–0.825) and 0.815 (CI 0.782–0.844), respectively. The Dynamic-DeepHit model shows slight improvement in discrimination upon the PCE model. The cross-sectional deep learning model achieved an AUC of 0.807 (CI 0.778–0.838) (Supplemental Table S3). The continuous net reclassification index (NRI) for the Dynamic-DeepHit model compared with the PCE was 0.385. The Brier Score for the PCE model was 0.054, 0.052 for the cross-sectional deep learning model and 0.051 for the longitudinal deep learning model, showing meaningful improvement in model calibration.

Table 3 Discrimination performance of models and in population subgroups.

Full size table

The predicted risks derived from the Dynamic-DeepHit model were found to be generally lower than the risks derived from the PCE (Fig. 2). In Fig. 3, the calibration of the Dynamic-DeepHit model is compared with the calibration of the PCE by comparing the predicted risk and observed risk within each decile of predicted risk. The PCE is shown to over-predict 10-year ASCVD risk, especially within the top 40% of predicted risk, which corresponds to the 7.5% risk threshold used in clinical guidelines. Comparatively, the calibration of the Dynamic-DeepHit model is consistently better along the entire spectrum of risk.

Figure 2

Figure 3

Model performance in population subgroups

The discriminative performance of the Dynamic-DeepHit and PCE models within different population groups is shown in Table 3. The Dynamic-DeepHit model performed relatively better than the PCE in the other-males group (0.801, CI 0.753–0.848 vs 0.779, CI 0.732–0.826), other-females group (0.801, CI 0.737–0.764 vs 0.780, CI 0.712–0.848), and the Black-females group (0.821, CI 0.751–0.890 vs 0.801, CI 0.726–0.877). However, it underperformed in the Black-males group (0.820, CI 0.751–0.888 vs 0.826, CI 0.756–0.897). In the under-60-years-old group, the Dynamic-DeepHit model had an AUROC of 0.803 (CI 0.747–0.858) compared with the PCE’s AUROC of 0.781 (CI 0.721–0.842) and in the over-60-years-old group, the Dynamic-DeepHit model had an AUROC of 0.698 (CI 0.646–0.749) compared with the PCE’s AUROC of 0.667 (0.615–0.719). The Dynamic-DeepHit model outperforms the PCE in three of the four demographic groups outlined by the PCE.

Feature importance

The results of the leave-one-out feature importance analysis are shown in Fig. 4. After removing age from the model, the greatest decrease in AUROC was observed (0.769, CI 0.735–0.803); thus, age is considered the most important variable in the model. Following age, longitudinal SBP was the second most important predictor, with the AUROC reduced to 0.777 (CI 0.744–0.809). Diabetes diagnosis and hypertension treatment were the most important categorical predictors, with AUROCs reduced to 0.779 (CI 0.747–0.812) and 0.780 (CI 0.748–0.813) when these predictors were removed respectively.

Figure 4

Figure 5 shows the longitudinal trajectories of clinical risk factors, including SBP, DBP, total cholesterol and HDL among the individuals whose risk increased and those whose risk decreased after switching to the Dynamic-DeepHit model for ASCVD risk prediction. Between the two groups, the average terminal measurements of SBP and total cholesterol were similar, the historical measurements of those risk factors were higher among those whose predicted risk increased in Dynamic-DeepHit model.

Figure 5

Borderline risk stratification

Among the individuals in the borderline and intermediate risk groups determined by the risk derived from the PCE, the AUC from the Dynamic-DeepHit model was higher than that from the PCE: 0.688 (CI 0.634–0.742) versus 0.652 (CI 0.594–0.709). The NRI for the Dynamic-DeepHit model between the borderline and intermediate group was 0.322. The Brier score was 0.069 for the PCE compared with 0.067 for the Dynamic-DeepHit model, again showing some improvement in the model calibration.

Given the 7.5% risk threshold for moderate-intensity statin prescription, we examined the individuals whose risk crossed the threshold in both directions to understand the Dynamic-DeepHit model’s potential impact on clinical decision making. In our testing dataset, among those who would be prescribed statins under the PCE risk (N = 1213), 33% (N = 405) would not be prescribed statins under the new risk provided by the Dynamic-DeepHit model, and 95% (N = 386) of those individuals would not develop ASCVD. Among those who were not prescribed statins using the PCE (N = 1900), 2% (N = 34) would be recommended to prescribe statins under the Dynamic-DeepHit model. However, of those individuals, only 3% (N = 1) developed ASCVD within 10 years.

Discussion

Principal findings

In this study, we have demonstrated that by incorporating longitudinal data of the same clinical and behavioral predictors as in the PCE using a state-of-the-art and validated deep learning model we can improve the calibration of predicting 10-year ASCVD risk. We leveraged data from 4 diverse cohorts for model training and testing and found that the longitudinal deep learning model outperformed the PCE both in the overall cohort and in specific subpopulations. We have demonstrated that the longitudinal deep learning model has clinical value through improved discrimination and greater calibration for those with borderline risk of ASCVD, thus providing physicians more reliable estimates of risk for clinical decision making.

Deep learning in ASCVD risk prediction

Longitudinal trends of clinical factors such as blood pressure and cholesterol have long been established to be of clinical importance¹³. While this is not the first study to incorporate longitudinal data for predicting ASCVD, to our knowledge, it is the first study that uses a deep learning approach. Prior studies used methods such as including aggregate summary statistics of the longitudinal clinical data in the PCE or landmark models that could update data at fixed time intervals^13,24,25. This foundational work led to minor improvements in model discrimination; however, we were able to achieve better performance because we utilized a deep learning method. A key advantage of deep learning models is their ability to recognize complex patterns by utilizing multiple layers of artificial neural networks, which are composed of inter-connected nodes. This advantage manifests in two ways in the Dynamic-DeepHit model. First, the improvement in the discrimination of the cross-sectional DeepHit model over the PCE demonstrates that given the same cross-sectional data, neural networks can make better predictions of ASCVD than the PCE. Second, the RNN can create robust representations of longitudinal clinical data, preserving critical information for ASCVD risk prediction.

Clinical implications

Through evaluating the Dynamic-DeepHit model in various population demographic groups, we found that the model improves risk prediction in Black females compared with the PCE^{26,27,28,29,30,31}. This indicates that incorporating longitudinal data may allow physicians to make more accurate treatment decisions and reducing health outcome disparities in these high-risk groups.

Among the individuals categorized as borderline- and intermediate-risk by the PCE, the Dynamic-DeepHit model improved discrimination and was better calibrated. One-third of the individuals in the intermediate PCE risk groups had overestimated 10-year ASCVD risk, which indicates the potential for over-prescribing. In these individuals, the Dynamic-DeepHit model slightly under-estimates risk, that it is better at ruling out people who will not have ASCVD events, while not as good as identifying those who will get ASCVD. As current clinical guidance requires further risk analysis for the individuals in these risk groups, guideline-concordant treatment is less optimal. By providing a better calibrated risk assessment, clinicians may be less concerned with over-prescribing and feel more confident in prescribing guideline-concordant treatment given the predicted risks from the Dynamic-DeepHit model.

The feature importance analysis shows that longitudinal measurements of clinical variables have meaningful influence on the performance on the Dynamic-DeepHit model. In the Dynamic-DeepHit model, longitudinal SBP was the most important modifiable predictor, while total cholesterol was found to be relatively important as well. Similar to prior research²¹, age was found to be an important predictor in the Dynamic-DeepHit model. In addition, diabetic status, sex, and smoking status were also found to influence the AUROC of the model. In the observed 8-year trajectories of SBP, DBP, and total cholesterol, for the individuals whose risk changed (Fig. 5), at the population level, the aggregate terminal measurements were similar. If prediction occurred only using those terminal measurements, a similar risk profile between those with increased risk and decreased risk would be assumed. However, the Dynamic-DeepHit model picked up separation in the historical values of those clinical factors, which contributed to the model identifying the differences in risk profiles of the two groups of individuals. Combined with the results of the feature importance analysis, this evidence further supports that longitudinal histories of clinical predictors can provide additional insight in evaluating ASCVD risk profiles.

With the proliferation of EHRs, longitudinal data is readily accessible. In addition, with the advent of cloud and edge computing, it is possible to deliver intensive computing capabilities to the EHR for supporting sophisticated machine learning or deep learning models for clinical risk prediction. This study shows, with further validation, deep learning models can be a powerful tool to aid clinicians to leverage the silos of currently untapped historical patient data in the EHR to improve patient cardiovascular outcomes. New methods of interpreting these models will also add confidence in adoption among physicians.

Limitations

There are several limitations in our study. First, the cohorts used in this study may not reflect the clinical conditions of present-day patients, who are more likely to be on CVD treatments, such as statins. Therefore, given the limited information we had on statin usage, we did not exclude any participants who may have been on statin treatment. Second, data was recorded more sparsely in the cohort studies, whereas clinical measurements are often more frequent in clinical practice³². The quality of the data stored in the EHR could be also compromising, due to varying clinical contexts of when the data was collected. While these data problems exist, deep learning methods are still one of the best tools to overcome such issues^33,34. On the other hand, EHRs often do not contain up to 8 years of longitudinal data on patients. As this study is a proof of concept, further work is needed to explore efficacy and utility of incorporating longitudinal risk factors into ASCVD risk prediction within EHRs.

Data availability

Data used in this manuscript is not publicly available due to prior legal agreements. However, readers may reach out to the corresponding author to receive access to the pooled data source.

References

Grundy, S. M. et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J. Am. Coll. Cardiol. 73(24), e285–e350. https://doi.org/10.1016/j.jacc.2018.11.003 (2019).
Article PubMed Google Scholar
Stone, N. J. et al. 2013 ACC/AHA guideline on the treatment of blood cholesterol to reduce atherosclerotic cardiovascular risk in adults: A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J. Am. Coll. Cardiol. 63(25 Pt B), 2889–934. https://doi.org/10.1016/j.jacc.2013.11.002 (2014).
Article PubMed Google Scholar
Whelton, P. K. et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension 71(6), e13–e115. https://doi.org/10.1161/HYP.0000000000000065 (2018).
Article CAS PubMed Google Scholar
Ridker, P. M. & Cook, N. R. Statins: New American guidelines for prevention of cardiovascular disease. The Lancet 382(9907), 1762–1765. https://doi.org/10.1016/S0140-6736(13)62388-0 (2013).
Article Google Scholar
DeFilippis, A. P., Young, R. & Blaha, M. J. Calibration and discrimination among multiple cardiovascular risk scores in a modern multiethnic cohort. Ann. Intern. Med. 163(1), 68–69. https://doi.org/10.7326/L15-5105-2 (2015).
Article PubMed Google Scholar
Wallisch, C. et al. Re-estimation improved the performance of two Framingham cardiovascular risk equations and the Pooled Cohort equations: A nationwide registry analysis. Sci. Rep. 10(1), 8140. https://doi.org/10.1038/s41598-020-64629-6 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Damen, J. A. et al. Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: A systematic review and meta-analysis. BMC Med. 17(1), 109. https://doi.org/10.1186/s12916-019-1340-7 (2019).
Article PubMed PubMed Central Google Scholar
Lloyd-Jones, D. M. et al. Use of risk assessment tools to guide decision-making in the primary prevention of atherosclerotic cardiovascular disease: A special report from the American Heart Association and American College of Cardiology. J. Am. Coll. Cardiol. 73(24), 3153–3167. https://doi.org/10.1016/j.jacc.2018.11.005 (2019).
Article PubMed Google Scholar
Wong, N. D. Cardiovascular risk assessment: The foundation of preventive cardiology. Am. J. Prev. Cardiol. 1, 100008. https://doi.org/10.1016/j.ajpc.2020.100008 (2020).
Article PubMed PubMed Central Google Scholar
Domanski, M. J. et al. Time course of LDL cholesterol exposure and cardiovascular disease event risk. J. Am. Coll. Cardiol. 76(13), 1507–1516. https://doi.org/10.1016/j.jacc.2020.07.059 (2020).
Article CAS PubMed Google Scholar
Reges, O. et al. Association of cumulative systolic blood pressure with long-term risk of cardiovascular disease and healthy longevity: Findings from the lifetime risk pooling project cohorts. Hypertension 77(2), 347–356. https://doi.org/10.1161/hypertensionaha.120.15650 (2021).
Article CAS PubMed Google Scholar
Rospleszcz, S. et al. Temporal trends in cardiovascular risk factors and performance of the Framingham Risk Score and the Pooled Cohort Equations. J. Epidemiol. Community Health 73(1), 19–25. https://doi.org/10.1136/jech-2018-211102 (2019).
Article PubMed Google Scholar
Pool, L. R., Ning, H., Wilkins, J., Lloyd-Jones, D. M. & Allen, N. B. Use of long-term cumulative blood pressure in cardiovascular risk prediction models. JAMA Cardiol. 3(11), 1096–1100. https://doi.org/10.1001/jamacardio.2018.2763 (2018).
Article PubMed PubMed Central Google Scholar
Lewis, M. et al. Comparison of deep learning with traditional models to predict preventable acute care use and spending among heart failure patients. Sci. Rep. 11(1), 1164. https://doi.org/10.1038/s41598-020-80856-3 (2021).
Article CAS PubMed PubMed Central Google Scholar
Shickel, B., Tighe, P. J., Bihorac, A. & Rashidi, P. Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22(5), 1589–1604. https://doi.org/10.1109/JBHI.2017.2767063 (2018).
Article PubMed Google Scholar
Si, Y. et al. Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review. J. Biomed. Inform. 115, 103671–103671. https://doi.org/10.1016/j.jbi.2020.103671 (2021).
Article PubMed Google Scholar
Zhao, Y. et al. BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients. arXiv:2103.10928. Accessed March 01, 2021. https://ui.adsabs.harvard.edu/abs/2021arXiv210310928Z (2021).
Zeng, Z., Deng, Y., Li, X., Naumann, T. & Luo, Y. Natural language processing for EHR-based computational phenotyping. IEEE/ACM Trans. Comput. Biol. Bioinform. 16(1), 139–153. https://doi.org/10.1109/TCBB.2018.2849968 (2019).
Article PubMed Google Scholar
Wang, F., Casalino, L. P. & Khullar, D. Deep learning in medicine: Promise, progress, and challenges. JAMA Intern. Med. 179(3), 293–294. https://doi.org/10.1001/jamainternmed.2018.7117 (2019).
Article PubMed Google Scholar
Wilkins, J. T. et al. Data resource profile: The cardiovascular disease lifetime risk pooling project. Int. J. Epidemiol. 44(5), 1557–1564. https://doi.org/10.1093/ije/dyv150 (2015).
Article PubMed PubMed Central Google Scholar
Berry, J. D. et al. Lifetime risks of cardiovascular disease. N. Engl. J. Med. 366(4), 321–329. https://doi.org/10.1056/NEJMoa1012848 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lee, C., Yoon, J. & Schaar, M. V. Dynamic-DeepHit: A deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Biomed. Eng. 67(1), 122–133. https://doi.org/10.1109/TBME.2019.2909027 (2020).
Article PubMed Google Scholar
Van Calster, B. & Vickers, A. J. Calibration of risk prediction models: Impact on decision-analytic performance. Med. Decis. Mak. 35(2), 162–169. https://doi.org/10.1177/0272989x14547233 (2015).
Article Google Scholar
Sayadi, M., Zare, N., Attar, A. & Ayatollahi, S. M. T. Improved landmark dynamic prediction model to assess cardiovascular disease risk in on-treatment blood pressure patients: A simulation study and post hoc analysis on SPRINT data. Biomed. Res. Int. 2020, 2905167. https://doi.org/10.1155/2020/2905167 (2020).
Article CAS PubMed PubMed Central Google Scholar
Paige, E. et al. Landmark models for optimizing the use of repeated measurements of risk factors in electronic health records to predict future disease risk. Am. J. Epidemiol. 187(7), 1530–1538. https://doi.org/10.1093/aje/kwy018 (2018).
Article PubMed PubMed Central Google Scholar
Roger, V. L. et al. Heart disease and stroke statistics–2011 update: A report from the American Heart Association. Circulation 123(4), e18–e209. https://doi.org/10.1161/CIR.0b013e3182009701 (2011).
Article PubMed Google Scholar
Ferdinand, K. C. et al. Disparities in hypertension and cardiovascular disease in blacks: The critical role of medication adherence. J. Clin. Hypertens. (Greenwich) 19(10), 1015–1024. https://doi.org/10.1111/jch.13089 (2017).
Article PubMed Google Scholar
Jolly, S., Vittinghoff, E., Chattopadhyay, A. & Bibbins-Domingo, K. Higher cardiovascular disease prevalence and mortality among younger blacks compared to whites. Am. J. Med. 123(9), 811–818. https://doi.org/10.1016/j.amjmed.2010.04.020 (2010).
Article PubMed Google Scholar
Mizuno, K. et al. Usefulness of pravastatin in primary prevention of cardiovascular events in women: Analysis of the Management of Elevated Cholesterol in the Primary Prevention Group of Adult Japanese (MEGA study). Circulation (New York, NY) 117(4), 494–502. https://doi.org/10.1161/CIRCULATIONAHA.106.671826 (2008).
Article CAS Google Scholar
Mosca, L., Barrett-Connor, E. & Wenger, N. K. Sex/gender differences in cardiovascular disease prevention what a difference a decade makes. Circulation (New York, NY) 124(19), 2145–2154. https://doi.org/10.1161/CIRCULATIONAHA.110.968792 (2011).
Article Google Scholar
Ridker, P. M. et al. Rosuvastatin to prevent vascular events in men and women with elevated C-reactive protein. N. Engl. J. Med. 359(21), 2195–2207. https://doi.org/10.1056/NEJMoa0807646 (2008).
Article CAS PubMed Google Scholar
Cohen, D. J. et al. Primary care practices’ abilities and challenges in using electronic health record data for quality improvement. Health Affairs Web Exclus. 37(4), 635–643. https://doi.org/10.1377/hlthaff.2017.1254 (2018).
Article Google Scholar
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18. https://doi.org/10.1038/s41746-018-0029-1 (2018).
Article PubMed PubMed Central Google Scholar
Xu, D., Hu, P. J., Huang, T. S., Fang, X. & Hsu, C. C. A deep learning-based, unsupervised method to impute missing values in electronic health records for improved patient management. J. Biomed. Inform. 111, 103576. https://doi.org/10.1016/j.jbi.2020.103576 (2020).
Article PubMed Google Scholar

Download references

Funding

The Lifetime Risk Pooling Project was supported in its inception by the National Institutes of Health/National Heart, Lung, and Blood Institute (R21 HL085375).

Author information

Authors and Affiliations

Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Jingzhi Yu, Xiaoyun Yang, Yu Deng, Amy E. Krefman, Lindsay R. Pool, Lihui Zhao, Xinlei Mi, Hongyan Ning, John Wilkins, Donald M. Lloyd-Jones, Lucia C. Petito & Norrina B. Allen

Authors

Jingzhi Yu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Deng
View author publications
You can also search for this author in PubMed Google Scholar
Amy E. Krefman
View author publications
You can also search for this author in PubMed Google Scholar
Lindsay R. Pool
View author publications
You can also search for this author in PubMed Google Scholar
Lihui Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xinlei Mi
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Ning
View author publications
You can also search for this author in PubMed Google Scholar
John Wilkins
View author publications
You can also search for this author in PubMed Google Scholar
Donald M. Lloyd-Jones
View author publications
You can also search for this author in PubMed Google Scholar
Lucia C. Petito
View author publications
You can also search for this author in PubMed Google Scholar
Norrina B. Allen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.Y., X.Y. contributed to the analysis of the project and writing of the manuscript. Y.D., A.K., L.P., L.Z., X.M., H.N., J.W., D.L.J. contributed to writing and editing of the manuscript. L.P., N.A. contributed to writing and editing of the manuscript, as well as the formulation of research project.

Corresponding author

Correspondence to Norrina B. Allen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yu, J., Yang, X., Deng, Y. et al. Incorporating longitudinal history of risk factors into atherosclerotic cardiovascular disease risk prediction using deep learning. Sci Rep 14, 2554 (2024). https://doi.org/10.1038/s41598-024-51685-5

Download citation

Received: 02 October 2023
Accepted: 08 January 2024
Published: 31 January 2024
DOI: https://doi.org/10.1038/s41598-024-51685-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Advanced search

Quick links

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing