Introduction

The Pooled Cohort Equations (PCE) were developed by the American College of Cardiology (ACC) and American Heart Association (AHA) in 2013 and updated in 2018 using data from 9 longitudinal cohort studies as a tool for clinicians to predict 10-year risk of atherosclerotic cardiovascular disease (ASCVD)1,2. The PCE are a set of race- and sex-specific Cox proportional hazards models, that include widely-accepted clinical and behavioral risk factors for ASCVD, including age, sex, race, systolic (SBP) and diastolic blood pressure (DBP), total cholesterol, high density lipid-protein (HDL) cholesterol, smoking status, and type 2 diabetes. In clinical practice, risk predictions from the PCE are a key criterion to determine eligibility for moderate to high intensity statins and hypertension treatments1,3. However, numerous studies have found the performance of the PCE varies across demographic groups4,5,6; c-statistics from these studies ranged from 0.55 to 0.77 (average: 0.70) in men and 0.61 to 0.82 (average: 0.74) in women7,8. Additionally, current clinical guidelines provide more ambivalent and complex treatment recommendations for those who fall in the borderline (5% to 7.5%) and intermediate risk groups (7.5% to 20%)9. A more accurate and robust risk prediction algorithm can help physicians better assess an individual’s risk, allowing them to make more appropriate treatment decisions.

A growing number of studies have demonstrated that long-term risk factor levels are associated with an individual’s risk for the development of ASCVD. For instance, incident CVD risk was shown to be dependent on cumulative exposure to LDL-C10. In a separate study, incident CVD and survival were also found to be associated with 10-year cumulative SBP11. Hence, long-term risk factor patterns may be predictive of ASCVD risk above and beyond cross-sectional levels12. In a prior study, after including 5-year and 10-year cumulative blood pressure measurements in the PCE, researchers found a moderate improvement in the net reclassification index13. Additionally, full integration of multiple longitudinal trajectories of clinical factors into ASCVD prediction is now feasible in clinical practice given advances in computing and electronic medical record (EMR) systems that allow clinicians to access longitudinal risk factor data for their patients.

In recent years, deep learning methods have been applied to many clinical predictive and classification problems to much success14,15,16. Compared with traditional statistical methods, deep learning methods are often superior at processing and creating representations of complex data, such as radiology images and unstructured physician notes17,18, without the need of prior feature engineering or selection15,19. Hence, deep learning can more thoroughly extract and leverage the rich features stored in longitudinal data such as longitudinal blood pressure measurements recorded in the electronic health records (EHR) for predictive tasks.

In this study, we incorporated cross-sectional and longitudinal clinical and behavioral risk factor levels into a state-of-the-art deep learning architecture to create a new prediction model for 10-year risk of incident ASCVD in a pooled cohort of 4 US-based, diverse longitudinal cohorts. We evaluated our model’s predictive performance in comparison to that of the PCE in the overall population and in key population subgroups to better understand the importance of longitudinal data for ASCVD risk prediction. Moreover, we determined the importance of each clinical variable used in the prediction model. Lastly, we performed additional evaluations of the model performance in the borderline and intermediate risk groups to better understand our model’s potential impact on clinical decision making.

Methods

Study population

The four longitudinal cohorts used in this study contributed data to the Cardiovascular Lifetime Risk Pooling Project (LRPP): the Framingham Heart Study, Framingham Offspring Study, Coronary Artery Risk Development in Young Adults (CARDIA) Study, and Atherosclerosis Risk in Communities (ARIC) Study20. These cohorts were selected for their number of participants, duration of follow-up, number of participant visits, and consistency of measurement of CVH risk factors.

As the examination schedules differed across cohorts, the number of exams within timeframes varied. To include the largest number of exams across the different studies while balancing the size of the timeframe for the study, we used 8 years of longitudinal data as the timeframe for CVD risk factor ascertainment (observation period). For consistency with the PCEs, outcomes were then measured over a 10-year follow-up period. Thus, to maximize the number of exams included in our study, we included data beginning at the following index exams (i.e. the exam at which risk factor follow-up began) for the included studies (Fig. 1): year 15 for the Framingham Heart Study, year 10 for the Framingham Offspring Study, year 18 for the CARDIA study, and year 1 for the ARIC study. The exact start and end years of each cohort as well as their mean and interquartile range of the number of exams in each cohort are shown in Table 1.

Figure 1
figure 1

Timeframe of the study. The longitudinal history of clinical and behavior risk factors within the 8 years prior to the index year were included in our prediction model. Our prediction model predicts the outcome within 10 years after the index year.

Table 1 The official start year, start year of the observation period (after adjustment), end year of the 8 year follow-up period, average number of exams within the 8 year follow-up period as well as the interquartile range of the number of exams by each cohort.

Eligible participants were over 40 and under the age of 75 years at the point of prediction (i.e. the end of the 8 year observation period), had no record of self-report or diagnosed ASCVD at the index exam or during the 8 year observation period, and had at least one measurement of SBP, DBP, total cholesterol and HDL cholesterol. The LRPP is approved by the Northwestern IRB and this study utilized de-identified data from each of the included cohorts in LRPP. Written informed consent was obtained for all participants and analysis were performed in accordance with relevant guidelines.

Outcome: ASCVD incidence

The outcome in our study was ASCVD incidence, defined as the incidence of coronary heart disease, ischemic stroke, or CVD-related death, over a 10-year period that began at the end of the observation period (Fig. 1)11,20. Coronary heart disease and ischemic stroke were adjudicated by review of medical records by study investigators20. Participants without any recorded event at the end of the study, or who died of other causes during the follow-up period were considered right censored.

Features: CVD risk factors

CVD risk factors included in the original PCE include systolic BP, diastolic BP, total cholesterol, and HDL cholesterol, and were measured 1–4 times during the 8-year observation period. Blood pressure was measured using standard methods by clinic staff in the various cohorts20,21. Fasting HDL-C, total cholesterol measurements and blood glucose were collected via blood serum20,21. Diagnosis of diabetes and treatment for hypertension, predictors also included in the PCE, were self-reported at the index visit20,21. Age, sex, race, ethnicity, smoking status (current/former smoker vs. never smoker), and alcohol consumption were self-reported at the index visit20,21.

Statistical analysis

The deep learning model used in this study is Dynamic-DeepHit, which enabled the incorporation of longitudinal risk factor data in a dynamic fashion to estimate 10-year risk of incident ASCVD22. The Dynamic-DeepHit model has been demonstrated to have substantial improvements over traditional predictive methods, including the Cox Proportional Hazards Model, in predicting cystic fibrosis outcomes22.

The Dynamic-DeepHit model consists of two neural networks: (1) a recurrent neural network (RNN) that processes the longitudinal measurements and predicts future measurements of time-varying covariates, and (2) a fully connected neural network that estimates the probability of the specific event at a given time. RNNs are commonly used for machine learning problems involving temporal or sequential data and can capture long-term dependencies in the data. The Dynamic-DeepHit model also utilizes an attention mechanism that identifies important longitudinal measurements when making risk predictions, which improves predictive performance. The second neural network takes as input the learned representations that are output from the first neural network along with the last recorded set of behavioral and clinical covariates (e.g. the most recent CVD risk factor measurements at the end of the 8-year observation period). The output layer of the second neural network converts the learned relationships between the risk factors and outcome into the 10-year risk of incident CVD.

To explore the reasons for any improvements in the predictive power we also implemented a cross-sectional DeepHit model. This allowed us to disentangle whether the improvements were due to the incorporation of the longitudinal data or simply to the complexity of the neural network modeling methods. The DeepHit model was fitted on only the last set of measurements for each participant within the 8-year observation period. We also fit the traditional PCE model, to understand its performance in this sample.

Data pre-processing included randomly splitting the dataset into 3 parts, called training, tuning, and testing, at a 3:1:1 ratio. The Dynamic-DeepHit and cross-sectional DeepHit models were trained in the training dataset and corresponding hyperparameters were tuned in the tuning dataset. The training data for the PCE included both the training and tuning datasets. The testing dataset, not used in model development, was used for validation. The participants were the same in each of the respective datasets for each model.

We assessed model discrimination and calibration of all 3 models. We calculated and compared the Area Under the Receiver Operator Curve (AUROC) for all models to evaluate model discrimination, the ability of the model to discriminate those who have a higher risk of having an event from those at lower risk. Brier scores were used to evaluate the calibration of the model; lower scores indicate better calibration, the extent of the estimated risk correspond to observed event rates23.

The trained Dynamic-DeepHit model was evaluated in the following population groups: Black males, Black females, other (White, Hispanic, Asian) males, other females, under 60 years old and 60 or over years old. These demographic groups were chosen to mirror the same classifications used for the sex- and race-specific PCE. As in the overall analysis, the AUROCs were compared between corresponding population subgroups.

To understand the importance of each predictor in the Dynamic-DeepHit model, we took a leave-one-out approach. We removed one predictor at a time from the Dynamic-DeepHit model and retrained and retested the model. The change in the testing dataset AUROC was calculated for each feature removed: the greater the change in AUROC, the greater the importance of the predictor. To also understand the role of longitudinal clinical risk factors better in the Dynamic-DeepHit model, we examined the average trajectories of SBP, DBP, total cholesterol and HDL for the individuals whose predicted risk increased and those whose risk decreased in the Dynamic-DeepHit model. Trajectories were created via generalized estimating equations (GEE) to account for correlation between repeated measurements for individuals. The trajectories were visualized across exam times with the 95% confidence bands.

Current blood pressure and cholesterol control guidelines use risk thresholds based on the PCE to inform clinical care. Physicians are advised to prescribe medium intensity statins if an individual’s ASCVD risk is over 7.5%. However, differentiation of individuals between the borderline and intermediate PCE risk groups could be improved. We calculated the net reclassification index (NRI) between the PCE and the Dynamic-DeepHit model, to understand how the Dynamic-DeepHit model changed individuals’ risk classification. We then conducted additional analysis to better understand the performance of the Dynamic-DeepHit model in borderline and intermediate groups, and how clinical behavior would be affected if the risk derived from the Dynamic-DeepHit model was used instead of risk from the PCE.

All statistical analysis was performed using Python version 3.8 and R 4.0.2. A 5% type-I error rate was used when calculating all confidence intervals.

Results

Baseline characteristics

Baseline demographics and measurements of CVD risk factors included in the PCE are described in Table 2. Pooled cohort participants included in this study were 55% female, 27% non-Hispanic Black and 50 years old on average. We found participants who developed ASCVD in prediction period had significantly higher levels of ASCVD risk factors compared with the participants who did not develop ASCVD. Baseline demographics and clinical characteristics of the participants by cohort is presented in Supplemental Table S1.

Table 2 Baseline demographics of LRPP analytic dataset. Risk factors in the ASCVD group were significantly higher than those in the Non-ASCVD group.

Performance of models

Table 3 shows the discrimination of the three models in the training and testing datasets. The AUCs for the PCE and the longitudinal Dynamic-DeepHit model were 0.792 (CI 0.760–0.825) and 0.815 (CI 0.782–0.844), respectively. The Dynamic-DeepHit model shows slight improvement in discrimination upon the PCE model. The cross-sectional deep learning model achieved an AUC of 0.807 (CI 0.778–0.838) (Supplemental Table S3). The continuous net reclassification index (NRI) for the Dynamic-DeepHit model compared with the PCE was 0.385. The Brier Score for the PCE model was 0.054, 0.052 for the cross-sectional deep learning model and 0.051 for the longitudinal deep learning model, showing meaningful improvement in model calibration.

Table 3 Discrimination performance of models and in population subgroups.

The predicted risks derived from the Dynamic-DeepHit model were found to be generally lower than the risks derived from the PCE (Fig. 2). In Fig. 3, the calibration of the Dynamic-DeepHit model is compared with the calibration of the PCE by comparing the predicted risk and observed risk within each decile of predicted risk. The PCE is shown to over-predict 10-year ASCVD risk, especially within the top 40% of predicted risk, which corresponds to the 7.5% risk threshold used in clinical guidelines. Comparatively, the calibration of the Dynamic-DeepHit model is consistently better along the entire spectrum of risk.

Figure 2
figure 2

Predicted risk (actual value) comparison between two models. Risk category derived from PCE risk category thresholds. horizontal dashed lines show the numeric thresholds for the risk categories used by the PCE mapped on to the risk predicted by the Dynamic-DeepHit model.

Figure 3
figure 3

Observed events compared with average predicted risk within each predicted risk decile. Confidence intervals are also shown via error bars. (a) Predicted vs observed risk in PCE. (b) Predicted vs observed risk in Dynamic-DeepHit.

Model performance in population subgroups

The discriminative performance of the Dynamic-DeepHit and PCE models within different population groups is shown in Table 3. The Dynamic-DeepHit model performed relatively better than the PCE in the other-males group (0.801, CI 0.753–0.848 vs 0.779, CI 0.732–0.826), other-females group (0.801, CI 0.737–0.764 vs 0.780, CI 0.712–0.848), and the Black-females group (0.821, CI 0.751–0.890 vs 0.801, CI 0.726–0.877). However, it underperformed in the Black-males group (0.820, CI 0.751–0.888 vs 0.826, CI 0.756–0.897). In the under-60-years-old group, the Dynamic-DeepHit model had an AUROC of 0.803 (CI 0.747–0.858) compared with the PCE’s AUROC of 0.781 (CI 0.721–0.842) and in the over-60-years-old group, the Dynamic-DeepHit model had an AUROC of 0.698 (CI 0.646–0.749) compared with the PCE’s AUROC of 0.667 (0.615–0.719). The Dynamic-DeepHit model outperforms the PCE in three of the four demographic groups outlined by the PCE.

Feature importance

The results of the leave-one-out feature importance analysis are shown in Fig. 4. After removing age from the model, the greatest decrease in AUROC was observed (0.769, CI 0.735–0.803); thus, age is considered the most important variable in the model. Following age, longitudinal SBP was the second most important predictor, with the AUROC reduced to 0.777 (CI 0.744–0.809). Diabetes diagnosis and hypertension treatment were the most important categorical predictors, with AUROCs reduced to 0.779 (CI 0.747–0.812) and 0.780 (CI 0.748–0.813) when these predictors were removed respectively.

Figure 4
figure 4

Feature Importance. AUROC was calculated via retraining and retesting the model while removing predictors individually. Red dashed line indicates the AUROC of the full Dynamic-DeepHit model.

Figure 5 shows the longitudinal trajectories of clinical risk factors, including SBP, DBP, total cholesterol and HDL among the individuals whose risk increased and those whose risk decreased after switching to the Dynamic-DeepHit model for ASCVD risk prediction. Between the two groups, the average terminal measurements of SBP and total cholesterol were similar, the historical measurements of those risk factors were higher among those whose predicted risk increased in Dynamic-DeepHit model.

Figure 5
figure 5

Longitudinal histories of clinical risk factors compared between individuals whose risk increased and decreased in the Dynamic-DeepHit model from the PCE. Generalized linear models were used to smooth the longitudinal clinical risk factor trajectories. Color bands show the 95% confidence interval.

Borderline risk stratification

Among the individuals in the borderline and intermediate risk groups determined by the risk derived from the PCE, the AUC from the Dynamic-DeepHit model was higher than that from the PCE: 0.688 (CI 0.634–0.742) versus 0.652 (CI 0.594–0.709). The NRI for the Dynamic-DeepHit model between the borderline and intermediate group was 0.322. The Brier score was 0.069 for the PCE compared with 0.067 for the Dynamic-DeepHit model, again showing some improvement in the model calibration.

Given the 7.5% risk threshold for moderate-intensity statin prescription, we examined the individuals whose risk crossed the threshold in both directions to understand the Dynamic-DeepHit model’s potential impact on clinical decision making. In our testing dataset, among those who would be prescribed statins under the PCE risk (N = 1213), 33% (N = 405) would not be prescribed statins under the new risk provided by the Dynamic-DeepHit model, and 95% (N = 386) of those individuals would not develop ASCVD. Among those who were not prescribed statins using the PCE (N = 1900), 2% (N = 34) would be recommended to prescribe statins under the Dynamic-DeepHit model. However, of those individuals, only 3% (N = 1) developed ASCVD within 10 years.

Discussion

Principal findings

In this study, we have demonstrated that by incorporating longitudinal data of the same clinical and behavioral predictors as in the PCE using a state-of-the-art and validated deep learning model we can improve the calibration of predicting 10-year ASCVD risk. We leveraged data from 4 diverse cohorts for model training and testing and found that the longitudinal deep learning model outperformed the PCE both in the overall cohort and in specific subpopulations. We have demonstrated that the longitudinal deep learning model has clinical value through improved discrimination and greater calibration for those with borderline risk of ASCVD, thus providing physicians more reliable estimates of risk for clinical decision making.

Deep learning in ASCVD risk prediction

Longitudinal trends of clinical factors such as blood pressure and cholesterol have long been established to be of clinical importance13. While this is not the first study to incorporate longitudinal data for predicting ASCVD, to our knowledge, it is the first study that uses a deep learning approach. Prior studies used methods such as including aggregate summary statistics of the longitudinal clinical data in the PCE or landmark models that could update data at fixed time intervals13,24,25. This foundational work led to minor improvements in model discrimination; however, we were able to achieve better performance because we utilized a deep learning method. A key advantage of deep learning models is their ability to recognize complex patterns by utilizing multiple layers of artificial neural networks, which are composed of inter-connected nodes. This advantage manifests in two ways in the Dynamic-DeepHit model. First, the improvement in the discrimination of the cross-sectional DeepHit model over the PCE demonstrates that given the same cross-sectional data, neural networks can make better predictions of ASCVD than the PCE. Second, the RNN can create robust representations of longitudinal clinical data, preserving critical information for ASCVD risk prediction.

Clinical implications

Through evaluating the Dynamic-DeepHit model in various population demographic groups, we found that the model improves risk prediction in Black females compared with the PCE26,27,28,29,30,31. This indicates that incorporating longitudinal data may allow physicians to make more accurate treatment decisions and reducing health outcome disparities in these high-risk groups.

Among the individuals categorized as borderline- and intermediate-risk by the PCE, the Dynamic-DeepHit model improved discrimination and was better calibrated. One-third of the individuals in the intermediate PCE risk groups had overestimated 10-year ASCVD risk, which indicates the potential for over-prescribing. In these individuals, the Dynamic-DeepHit model slightly under-estimates risk, that it is better at ruling out people who will not have ASCVD events, while not as good as identifying those who will get ASCVD. As current clinical guidance requires further risk analysis for the individuals in these risk groups, guideline-concordant treatment is less optimal. By providing a better calibrated risk assessment, clinicians may be less concerned with over-prescribing and feel more confident in prescribing guideline-concordant treatment given the predicted risks from the Dynamic-DeepHit model.

The feature importance analysis shows that longitudinal measurements of clinical variables have meaningful influence on the performance on the Dynamic-DeepHit model. In the Dynamic-DeepHit model, longitudinal SBP was the most important modifiable predictor, while total cholesterol was found to be relatively important as well. Similar to prior research21, age was found to be an important predictor in the Dynamic-DeepHit model. In addition, diabetic status, sex, and smoking status were also found to influence the AUROC of the model. In the observed 8-year trajectories of SBP, DBP, and total cholesterol, for the individuals whose risk changed (Fig. 5), at the population level, the aggregate terminal measurements were similar. If prediction occurred only using those terminal measurements, a similar risk profile between those with increased risk and decreased risk would be assumed. However, the Dynamic-DeepHit model picked up separation in the historical values of those clinical factors, which contributed to the model identifying the differences in risk profiles of the two groups of individuals. Combined with the results of the feature importance analysis, this evidence further supports that longitudinal histories of clinical predictors can provide additional insight in evaluating ASCVD risk profiles.

With the proliferation of EHRs, longitudinal data is readily accessible. In addition, with the advent of cloud and edge computing, it is possible to deliver intensive computing capabilities to the EHR for supporting sophisticated machine learning or deep learning models for clinical risk prediction. This study shows, with further validation, deep learning models can be a powerful tool to aid clinicians to leverage the silos of currently untapped historical patient data in the EHR to improve patient cardiovascular outcomes. New methods of interpreting these models will also add confidence in adoption among physicians.

Limitations

There are several limitations in our study. First, the cohorts used in this study may not reflect the clinical conditions of present-day patients, who are more likely to be on CVD treatments, such as statins. Therefore, given the limited information we had on statin usage, we did not exclude any participants who may have been on statin treatment. Second, data was recorded more sparsely in the cohort studies, whereas clinical measurements are often more frequent in clinical practice32. The quality of the data stored in the EHR could be also compromising, due to varying clinical contexts of when the data was collected. While these data problems exist, deep learning methods are still one of the best tools to overcome such issues33,34. On the other hand, EHRs often do not contain up to 8 years of longitudinal data on patients. As this study is a proof of concept, further work is needed to explore efficacy and utility of incorporating longitudinal risk factors into ASCVD risk prediction within EHRs.