Unsupervised learning to characterize patients with known coronary artery disease undergoing myocardial perfusion imaging

Williams, Michelle C.; Bednarski, Bryan P.; Pieszko, Konrad; Miller, Robert J. H.; Kwiecinski, Jacek; Shanbhag, Aakash; Liang, Joanna X.; Huang, Cathleen; Sharir, Tali; Dorbala, Sharmila; Di Carli, Marcelo F.; Einstein, Andrew J.; Sinusas, Albert J.; Miller, Edward J.; Bateman, Timothy M.; Fish, Mathews B.; Ruddy, Terrence D.; Acampa, Wanda; Hauser, M. Timothy; Kaufmann, Philipp A.; Dey, Damini; Berman, Daniel S.; Slomka, Piotr J.

doi:10.1007/s00259-023-06218-z

Unsupervised learning to characterize patients with known coronary artery disease undergoing myocardial perfusion imaging

Original Article
Open access
Published: 17 April 2023

Volume 50, pages 2656–2668, (2023)
Cite this article

Download PDF

You have full access to this open access article

European Journal of Nuclear Medicine and Molecular Imaging Aims and scope Submit manuscript

Unsupervised learning to characterize patients with known coronary artery disease undergoing myocardial perfusion imaging

Download PDF

Michelle C. Williams^1,2^na1,
Bryan P. Bednarski¹^na1,
Konrad Pieszko¹,
Robert J. H. Miller^1,3,
Jacek Kwiecinski^1,4,
Aakash Shanbhag¹,
Joanna X. Liang¹,
Cathleen Huang¹,
Tali Sharir⁵,
Sharmila Dorbala⁶,
Marcelo F. Di Carli⁶,
Andrew J. Einstein⁷,
Albert J. Sinusas⁸,
Edward J. Miller⁸,
Timothy M. Bateman⁹,
Mathews B. Fish¹⁰,
Terrence D. Ruddy¹¹,
Wanda Acampa¹²,
M. Timothy Hauser¹³,
Philipp A. Kaufmann¹⁴,
Damini Dey¹,
Daniel S. Berman¹ &
…
Piotr J. Slomka¹

2411 Accesses
2 Citations
4 Altmetric
Explore all metrics

Abstract

Purpose

Patients with known coronary artery disease (CAD) comprise a heterogenous population with varied clinical and imaging characteristics. Unsupervised machine learning can identify new risk phenotypes in an unbiased fashion. We use cluster analysis to risk-stratify patients with known CAD undergoing single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI).

Methods

From 37,298 patients in the REFINE SPECT registry, we identified 9221 patients with known coronary artery disease. Unsupervised machine learning was performed using clinical (23), acquisition (17), and image analysis (24) parameters from 4774 patients (internal cohort) and validated with 4447 patients (external cohort). Risk stratification for all-cause mortality was compared to stress total perfusion deficit (< 5%, 5–10%, ≥10%).

Results

Three clusters were identified, with patients in Cluster 3 having a higher body mass index, more diabetes mellitus and hypertension, and less likely to be male, have dyslipidemia, or undergo exercise stress imaging (p < 0.001 for all). In the external cohort, during median follow-up of 2.6 [0.14, 3.3] years, all-cause mortality occurred in 312 patients (7%). Cluster analysis provided better risk stratification for all-cause mortality (Cluster 3: hazard ratio (HR) 5.9, 95% confidence interval (CI) 4.0, 8.6, p < 0.001; Cluster 2: HR 3.3, 95% CI 2.5, 4.5, p < 0.001; Cluster 1, reference) compared to stress total perfusion deficit (≥10%: HR 1.9, 95% CI 1.5, 2.5 p < 0.001; < 5%: reference).

Conclusions

Our unsupervised cluster analysis in patients with known CAD undergoing SPECT MPI identified three distinct phenotypic clusters and predicted all-cause mortality better than ischemia alone.

Machine learning-based diagnosis and risk classification of coronary artery disease using myocardial perfusion imaging SPECT: A radiomics study

Article Open access 10 September 2023

Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population

Article 06 December 2014

Machine learning in the integration of simple variables for identifying patients with myocardial ischemia

Article 22 May 2018

Introduction

Patients with known coronary artery disease are a heterogenous population with varied clinical and imaging characteristics. Despite advances in contemporary medical, interventional, and surgical management, there remains a subgroup of patients with known cardiovascular disease who are at high risk of cardiac events and mortality [1]. Improved methods to characterize patients with known coronary artery disease who are at increased risk of cardiac events would enable more personalized, targeted management, and guide the use of new medical therapies.

Myocardial perfusion imaging (MPI) with single-photon emission computed tomography (SPECT) is an established technique to identify myocardial ischemia and risk-stratify patients [2]. Quantitative information from SPECT can provide valuable additional prognostic information over and above visual assessment alone [3]. Recently, the multi-center REgistry of Fast Myocardial Perfusion Imaging with NExt generation SPECT (REFINE SPECT) registry has been established, which aims to create a comprehensive clinical and imaging database of the latest generation SPECT images which are processed with quantitative software [4]. Supervised machine learning has been used to combine clinical and quantitative imaging features to improve prognostic assessment of patients undergoing SPECT [3, 5,6,7]. However, unsupervised machine learning has the potential to identify new cardiovascular phenotypes with unique prognostic implications.

Unsupervised machine learning aims to identify groups, or clusters, of patients which have similar combinations of characteristics, without the impact of biases from clinical experts or information on subsequent outcomes. Unsupervised learning differs from more commonly applied supervised methods by learning to separate data distributions into clusters, rather than being trained to predict specific classification or regression outcomes. This distinction allows unsupervised methods to unveil new patterns in cardiovascular diseases, to develop new understanding of disease phenotypes, and to identify novel high-risk groups without outcome bias. Cluster analysis has previously been used to identify clinical and imaging features that predict the risk of future cardiovascular events using magnetic resonance imaging, coronary computed tomography angiography, and echocardiography [8,9,10]. However, this technique has not previously been applied to MPI (SPECT or PET) or shown to improve prognostication for patients with known coronary artery disease in any imaging modality. Patients with known coronary artery disease represent a unique clinical challenge as a subset of these patients are at the highest risk of myocardial infarction, whereas others remain event free. Of the currently available prognostic scores for patients with known coronary artery disease, few incorporate these non-invasive imaging metrics, and their performance is low [11].

This study aims to use unsupervised machine learning to identify clusters amongst patients with known coronary artery disease who underwent SPECT MPI, and to assess how these phenotypic clusters differ in terms of all-cause mortality and subsequent cardiac events.

Materials and methods

Study design

In this multicenter, retrospective analysis of imaging and clinical data from the expanded REFINE SPECT registry [4], we performed unsupervised machine learning to identify phenotypic clusters amongst patients with known coronary artery disease who had undergone SPECT MPI, and to assess the association of these cluster groups with outcomes. The study complied with the Declaration of Helsinki and was approved by the institutional review boards of local sites and Cedars-Sinai Medical Center.

Study population

The REFINE SPECT registry is an international multicenter registry of consecutive patients undergoing clinically indicated SPECT MPI which currently includes 37,298 patients from 10 worldwide sites [4]. From this registry, we selected patients with known coronary artery disease, defined as those with (one or more of) previous myocardial infarction, percutaneous coronary intervention, or coronary artery bypass grafting. Inclusion and exclusion criteria are detailed in Fig. 1. Patients were excluded if stress imaging was not available (n = 16), if follow-up information on death or major adverse cardiovascular events (MACE) was incomplete (n = 68), or if they had no previous history of coronary artery disease (n = 27,993).

Clinical information

Clinical information was obtained from the REFINE SPECT registry database and included demographic information, cardiovascular risk factors, past medical history, and resting electrocardiogram (ECG) findings (Supplementary Table 1).

SPECT MPI

SPECT MPI was performed at 10 sites (Assuta Medical Center, Tel Aviv, Israel; Brigham and Women’s Hospital, Boston, USA; Cedars-Sinai Medical Center, Los Angeles, CA, USA; Oklahoma Heart Hospital, Oklahoma City, OK, USA; Oregon Heart and Vascular Institute, Springfield, OR, USA; Ottawa Heart Institute, Ottawa, Ontario, Canada; University of Calgary, Calgary, Alberta, Canada; University of Naples, Naples, Italy; Yale University, New Haven, CT, USA; University Hospital Zurich, Zurich, Switzerland) with three different scanners (GE Discovery NM 530c, GE Discovery 570c, GE Healthcare, Haifa, Israel, and D-SPECT, Spectrum Dynamics, Haifa, Israel). SPECT acquisition parameters were recorded, including the type of stress performed, the physiological and clinical response to stress, and the isotope type and dose (Supplementary Table 1).

Quantitative SPECT MPI parameters were generated automatically using the Quantitative Perfusion SPECT (QPS)/Quantitative Gated SPECT (QGS) software (Cedars-Sinai Medical Center, Los Angeles, CA, USA) [12, 13]. Deidentified images were reviewed by core laboratory technologists, blinded to clinical data, for quality control. QPS/QGS was then used to generate myocardial contours and automatically generate quantitative SPECT MPI parameters including 24 stress/rest, gated/ungated, perfusion, and function parameters (parameters provided by group in Supplementary Table 1). Stress total perfusion deficit from a single position was classified as < 5%, 5–10%, or ≥10% for analysis. Quantitative percent ischemia was automatically determined as the difference between stress and rest total perfusion deficits, and was classified as < 5%, 5–10%, or ≥10% for analysis [3]. Single position acquisitions were obtained for 1,422/9,221 (15%) patients; therefore, we did not use two-position combined TPD.

Data preprocessing

Data from four original REFINE SPECT sites (Assuta Medical Center, Brigham and Women’s Hospital, Cedars-Sinai Medical Center, Oregon Heart and Vascular Institute; n = 4774) were used as the internal cohort to perform unsupervised learning and to understand the clinical characteristics of the clusters. Data from six sites (Oklahoma Heart Center, Ottawa Heart Institute, University of Calgary, University of Naples, University Hospital Zurich, Yale University; n = 4447) was used as the external cohort to test impact of the cluster groups on outcomes (Fig. 1). All external site data was from the new REFINE SPECT sites, except for Ottawa, which was included with the external set to balance the sizes of the internal training and external testing cohorts. Clinical and imaging characteristics for internal and external cohorts are provided in Supplementary Table 2.

Data preprocessing to provide the machine learning algorithm with clean, uniform, and consistent data was performed. Machine learning analysis was performed in Python (version 3.9.7) using clinical information (23 parameters), acquisition parameters (17 parameters), and quantitative image analysis parameters (24 parameters; Supplementary Table 1). Visual SPECT-MPI assessments were not used to avoid potential clinician biases. Cardiovascular events and mortality were not used in the analysis because we wanted to develop a model that could phenotype and derive new pathophysiologic insights for patients at the time of imaging without bias towards specific outcomes. Unsupervised cluster analysis fits these requirements as the model learns to separate patients according to their individual data profile without a priori exposure to any outcome. Features with > 25% missingness were dropped from the set used for model fitting. Missing variables were imputed using median imputation for continuous variables and mode imputation for categorical variables. Data normalization was applied only when selected as an optional hyperparameter (Supplementary Table 3).

Clinical outcomes

Patients were followed up for the occurrence of revascularization, myocardial infarction, unstable angina, percutaneous coronary intervention, coronary artery bypass grafting, and all-cause mortality. Major adverse cardiovascular events (MACE) were defined as coronary revascularization, myocardial infarction, admission for unstable angina, or all-cause mortality. Prognostic information for some of this population at 5 years has previously been reported [3].

Unsupervised machine learning

Our primary tool is an unsupervised learning model that assigns patients to novel clusters for further analysis. This model first maps high dimensional patient data to a much lower-dimensional embedding space where patients can be efficiently clustered.

Dimensionality reduction was performed using the non-linear Uniform Manifold Approximation and Projection (UMAP) toolkit (UMAP Learn, version 0.5.2) [14]. Dimensionality reduction improves the performance of cluster analysis by simplifying the input feature space prior to clustering, reducing computation time and noise, while preserving the global data structure (i.e., the relative relationships between patients in the data) [14]. Traditional distance metrics break down at high dimensions, necessitating dimensionality reduction prior to clustering [15]. UMAP was selected as the primary engine for our unsupervised pipeline as it utilizes non-linear manifold approximation theory to estimate a low-dimensional data representation in a more efficient and scalable manner than other commonly used methods, while retaining a stable model representation that is saved and viable for clinical deployment [16]. Cao et al. demonstrated the robustness of UMAP to embed high dimensional data from cellular biology into a new representation, leading to fewer clusters than other commonly used methods [17]. The nature of our high-dimensional and multi-modal application supports the application of UMAP, which is expected to maintain performance as imaging technology advances and the total number of imaging variables grows. Reduction to three dimensions prior to clustering was selected to balance visualization of formed clusters with the embedding complexity. UMAP models were tested with classical clustering algorithms (hierarchical, k-means, gaussian mixture model; Scikit-Learn package, version 1.0.1) during internal model selection and validation.

A grid search was used to select the optimal dimensionality reduction parameters, clustering method, and number of clusters. Parameter ranges presented in Supplementary Table 3 were selected with the intention of producing a wide range of embedding structures and clustering combinations. Each set of parameters was evaluated and compared using the silhouette coefficient, with the optimal model being the configuration with the highest mean silhouette coefficient across the entire grid search. Silhouette coefficients are a standard metric to assess how well clusters are separated by assessing the separation distance of individuals between and within a cluster [18]. Silhouette scores (range: -1 to 1) increase as the distances to other patients in the same cluster decrease and the distances to patients from other clusters increase. They are similar to cluster-comparison metrics like the Davies-Bouldin and Calinski-Harabasz indices, with the advantage of providing in-built normalization so that scores are not biased to cluster sizes.

To validate the clinical utility of the unsupervised clustering, the model was tested in the external cohort. The trained dimensionality reduction model and the coordinates of the three cluster centroids derived in the internal cohort were used to assign external cohort patients to similar clusters.

Statistical analysis

Statistical analysis was performed using R (version 4.1.1). Normally distributed data are presented with mean and standard deviation. Data that are not normally distributed are presented as median and interquartile range (IQR). Categorical data are presented as number and percentage. Statistical significance was assessed using Mann–Whitney Wilcoxon, Kruskal–Wallis rank sum, or Pearson’s chi-squared test. The hypergeometric distribution v-test (FactoMine R package, version 2.4) was performed to assess the representation of variables within each cluster, with a positive v-test score indicating over-representation of the variable within the cluster and a negative v-test score indicating under-representation [19]. Outcome data were analyzed using Cox proportional-hazards analysis with hazard ratios (HR), and 95% confidence intervals (CI) were calculated and compared using the global log-rank test and Wald test. Kaplan–Meier curves were constructed. A two-sided p-value < 0.05 was considered statistically significant. Separate survival analysis disaggregated by sex is presented according to the recommended use of SAGER guidelines for sex and gender equity in research [20].

Results

Study population

From 37,298 patients in the expanded REFINE SPECT registry, we identified 9221 patients with known coronary artery disease from ten sites where both clinical and imaging data was available. Cluster analysis was performed using data from 4774 patients in the internal cohort. These patients had a median age of 67 [IQR 60 to 75] years, and 78% (n = 3704) were male (Table 1). Forty-five percent (n = 2166) had a previous myocardial infarction, 71% (n = 3409) had previous percutaneous coronary intervention, and 2.5% (n = 121) had previous coronary artery bypass grafting. Similar demographic characteristics were present in the external cohort (Supplementary Table 4).

Table 1 Demographic characteristics for all patients and clusters in the internal cohort

Full size table

Unsupervised machine learning

The optimal unsupervised clustering method used the Braycurtis distance metric for dimensionality reduction with UMAP and K-means clustering. This model identified 3 optimal clusters, with a silhouette score of 0.93 (Supplementary Fig. 1). Based on clinical, acquisition, and quantitative imaging parameters, patients in the internal cohort were divided into three phenotypic clusters which were called Cluster 1 (n = 2005), Cluster 2 (n = 1580), and Cluster 3 (n = 1189; Fig. 2). This method was used to assign patients in the external cohort into three distinct clusters (Cluster 1, n = 1799; Cluster 2, n = 2213; Cluster 3, n = 435) which were used for assessment of cardiovascular outcomes.

Clinical phenotypes of clusters

Patients in Cluster 3 had a higher body mass index and more hypertension, diabetes mellitus, smoking (p < 0.01 internal), and peripheral vascular disease (p < 0.001 for all others; Table 1; Supplementary Table 4). They were more likely to have had a previous myocardial infarction (p < 0.05 internal), to present with typical or atypical symptoms, and to have an abnormal resting electrocardiogram (p < 0.001 for all others). However, they were less likely to be male or have dyslipidemia (p < 0.001 for all). V-test scores showed that the clinical parameters which were the most over-represented in Cluster 3 were peripheral vascular disease, rest systolic blood pressure, symptoms of atypical angina, body mass index, and female gender (Supplementary Fig. 2).

In contrast, patients in Cluster 1 were younger, more likely to be male and have dyslipidemia, a family history of coronary artery disease, and be asymptomatic (p < 0.001 for all; Table 1; Supplementary Table 4). V-test scores showed that the clinical parameters which were over-represented in the Cluster 1 were height, family history of coronary artery disease, and male gender (Supplementary Fig. 2).

Imaging phenotypes of clusters

Patients in Cluster 3 were less likely to undergo exercise stress or have abnormal electrocardiogram response to stress than patients in Cluster 1 (p < 0.001 for all; Table 2; Supplementary Table 4). V-test scores showed that pharmacological stress, stress administered activity, and ischemic or non-diagnostic clinical response to stress were over-represented in Cluster 3 (p < 0.001 for all; Supplementary Fig. 2). In contrast, patients in Cluster 1 were more likely to undergo exercise stress and have an abnormal electrocardiogram response to stress (p < 0.001 for all; Table 2; Supplementary Table 4). In Cluster 1, v-test scores showed that exercise stress, exercise duration, stress peak heart rate, positive heart rate response, and stress systolic blood were over-represented (p < 0.001 for all; Supplementary Fig. 2).

Table 2 SPECT acquisition characteristics for all patients and clusters in the internal cohort

Full size table

Quantitative SPECT-MPI analysis showed that patients in Cluster 3 had higher rest total perfusion deficit, and stress gated shape index on end diastolic and end systolic images (Fig. 3; Supplementary Table 5). Patients in Cluster 2 had the highest stress total perfusion deficit, percent ischemia, and stress and rest gated end diastolic and systolic volumes (p < 0.001 for all). Patients in Cluster 1 had the highest stress and rest ejection fraction (Supplementary Table 5).

Cardiovascular outcomes

During a median follow-up of 4.2 [3.3, 5.1] years, all-cause mortality occurred in 584 patients (12%) and MACE in 1504 (32%) patients in the internal cohort (Table 3). Patients in Cluster 3 in the internal cohort were more likely to experience myocardial infarction, unstable angina, early revascularization, MACE, and all-cause mortality, but not all revascularization or coronary artery bypass grafting (Table 3).

Table 3 Cardiovascular outcomes for all patients and clusters in the internal cohort

Full size table

During a median follow-up of 2.6 [0.14, 3.3] years, all-cause mortality occurred in 312 (7%) of patients and MACE occurred in 1063 (24%) patients in the external cohort (Supplementary Table 5). Patients in Cluster 3 in the external cohort were more likely to experience myocardial infarction, unstable angina, MACE, all-cause mortality, and revascularization, but not coronary artery bypass grafting (Supplementary Table 6).

In the external cohort, all-cause mortality was almost six times more likely in Cluster 3 compared to Cluster 1 (HR 5.9, 95% CI 4.0 to 8.6, p < 0.001) and three times more likely in Cluster 2 (HR 3.3, 95% CI 2.5 to 4.5, p < 0.001; Fig. 4). In contrast, stress total perfusion deficit provided less risk differentiation between groups for all-cause mortality (HR 1.9, 95% CI 1.5 to 2.5, p < 0.001 for stress total perfusion deficit ≥10% versus < 5%; Fig. 4). Similarly, ischemia provided less differentiation between groups for all-cause mortality (HR 1.8, 95% CI 1.3 to 2.5, p < 0.001 for quantitative percent ischemia ≥10% versus < 5%; Supplementary Fig. 3).

In the external cohort, MACE was also more likely to occur in Cluster 3 (HR 4.2, 95% CI 3.4 to 5.1, p < 0.001), and Cluster 2 (HR 1.2, 95% CI 1.1 to 1.4, p = 0.002), compared to Cluster 1. In contrast, stress total perfusion deficit (HR 1.8, 95% CI 1.6 to 2.1, p < 0.001 for stress total perfusion deficit ≥10% versus < 5%) and ischemia (HR 2.1, 95% CI 1.7 to 2.4, p < 0.001 for quantitative percent ischemia ≥10% versus < 5%) provided less differentiation between groups for MACE (Fig. 5).

Results of Cox proportional hazards analysis for clusters disaggregated according to sex for both internal and external cohorts. Supplementary Tables 7–10 demonstrate consistent risk-stratification of male and female patients for all-cause mortality and MACE outcomes by learned clusters. Supplementary Tables 11–12 demonstrate that risk-stratification by site in the training population is inferior to stratification by unsupervised clusters.

Discussion

We have implemented an unsupervised machine learning approach to identify new phenotypic clusters amongst patients with known coronary artery disease undergoing SPECT MPI. Patients with known coronary artery disease are an understudied and heterogenous group, and improved risk stratification for this population will advance targeted management strategies. In this large multicenter registry, we have identified three clusters. These clusters demonstrated important differences in all-cause mortality and MACE, even though no outcome endpoints were included in the unsupervised machine learning model. The clustering was an unbiased approach based on clinical features, acquisition features, and fully automated quantitative image analysis features, without using information on cardiovascular events. The cluster assignment provided improved risk assessment compared to quantitative SPECT MPI ischemia alone. Importantly, we demonstrated excellent performance of the clustering approach in an external population not used for model training. In clinical practice, the use of these clusters could improve personalized management of coronary artery disease by robust identification of patients at low, medium, and high risks of all-cause mortality and MACE after SPECT MPI.

For patients with known coronary artery disease with stable chest pain despite optimal guideline directed medical therapy, stress imaging has a Class I indication in the current ACC/AHA guidelines [2]. Assessing the severity of ischemia can be used to guide decisions regarding the use and intensification of anti-anginal medications and the use of invasive coronary angiography [2]. In our study, we have shown that the clustering based on clinical, acquisition, and automated image analysis parameters can provide better stratification of risk compared to stress total perfusion deficit alone. This machine learning model was developed using automated SPECT MPI analysis, along with acquisition and clinical parameters, to provide an objective and reproducible input that is not dependent on the site experience and reading style. The cluster assignment can be available to clinicians at the time of reporting as an aid in the overall patient assessment.

Cluster analysis can also provide important information on the demographic characteristics of patients within each group, which can help to develop a new understanding of disease. In our study, the low-risk cluster was predominantly comprised of patients with established cardiovascular risk factors, but with overall good cardiovascular condition such as the ability to perform exercise stress and normal ejection fraction. In contrast, Cluster 3 represented patients with established cardiovascular risk factors but with poor cardiovascular condition, established findings of perfusion defects, and inability to perform exercise stress. We note that Cluster 3 has higher proportion of female patients compared to Clusters 1 and 2. While the reason for this is not clear, it may be that the unsupervised machine learning model is identifying combinations of risk factors and SPECT MPI findings that are more common in females. To our knowledge, this finding was not previously reported in the literature. Consistent risk stratification when results were disaggregated according to sex demonstrated that the dataset’s sex imbalance did not limit the applicability of this model to the minority group of female patients. Some variance between cluster characteristics could be explained by site-specific protocols; however, the improved risk stratification provided by the clusters compared to stratification by site in the internal training population suggests that the algorithm is identifying more than site alone. Thus, the combination of the clinical, acquisition, and quantitative image analysis findings using unsupervised machine learning can identify new groups of patients with known coronary artery disease. Such unsupervised machine learning (not directly trained on outcome data as in standard statistical and machine learning approaches) is robust and resistant to overfitting as demonstrated in our external testing of the clustering.

Automated quantitative analysis of SPECT MPI can provide additional information to improve risk stratification compared to visual assessment alone in a variety of sub-groups within the REFINE SPECT registry [3, 21, 22]. Quantitative assessment of changes in ventricular morphology such as shape and eccentricity indices has been shown to be independently associated with MACE [23]. Transient ischemic dilation and wall motion abnormalities have also been shown to identify patients with mild ischemia who are at increased risk [24]. There is an increasing number of clinical and imaging parameters which clinicians must synthesize in decision making regarding patient care. The cluster analysis provided in this paper provides a new method of synthesizing this information in an unbiased method-not directly driven by outcomes, to identify new and important phenotypic clusters. We have also shown in this paper that these phenotypic clusters have prognostic implications, which are robust in an external validation cohort. The trained models produced in this paper could be incorporated into semi-automatic SPECT software to automatically provide this information to physicians, but further work is required to assess the impact of this on decision making and patient care.

Supervised machine learning assessment of SPECT MPI parameters has previously been shown to be a better predictor of early coronary revascularization than assessment by a nuclear cardiologist or automatically quantified tissue perfusion defects [7]. In supervised machine learning, the computer model is trained based on knowledge of a defined outcome, such as mortality or MACE. In contrast, in unsupervised machine learning, the computational model is not provided with an outcome and instead seeks to understand patterns in the data and create clusters based on phenotypic similarities. Unsupervised machine learning has recently emerged as a useful technique to identify new phenotype-based groupings in complex diseases without the pre-conceived biases of existing categories. It has been used to identify new classifications in patients undergoing a variety of imaging tests, including patients with bicuspid aortopathy undergoing computed tomography [25], healthy volunteers in the UK Biobank study undergoing cardiac MRI [26], patients with left ventricular assist devices undergoing echocardiography preoperatively [27], and patients with aortic stenosis undergoing echocardiography [28]. It has also been used to identify new subtypes of patients based on clinical characteristics in a variety of diseases including heart failure [29], hypertension [30], type 2 diabetes mellitus [31], and amyloidosis [32]. Our paper represents the first time that unsupervised machine learning has been applied to SPECT MPI to improve clinical prognostication and the first time the output of unsupervised machine learning has been tested independently in an external population. Our unsupervised technique provides unbiased clustering of patients, not influenced by prior knowledge of the disease in question, as the model is trained without information on outcomes.

Study limitations

We must acknowledge some limitations of our study. This was a large study with data from four sites used to create the unsupervised machine learning model and data from six sites used for external validation. However, it was a retrospective study, with heterogeneity in the imaging technique between sites, including referrals, imaging protocols, and administered radiotracer doses. The heterogeneity in patient populations does increase the generalizability of our findings. In addition, 78% of the patients included in this study were male, likely representing the pattern of disease in this population. Further work is therefore required with datasets from a larger number of centers including more women and using different protocols. Information on race and ethnicity is not available in the REFINE SPECT registry; therefore, further work is required to assess the impact of this machine learning tool in more diverse populations. Other methods to perform unsupervised machine learning which were not explored in this paper may have revealed different results. Additionally, the UMAP model does not provide interpretability along axis of the fit dimensions. However, our use of well-known quantitative parameters allows for detailed analysis and interpretability of the divisions into clusters according to clinical practice.

Conclusion

In this study, unsupervised learning has identified new phenotypic clusters of SPECT MPI patients with known coronary artery disease. Despite not using outcomes during training, the model shows improved prognostic assessment as compared to standard quantitative measures. These clusters could be used to help clinicians in robust identification of high-risk patients and more personalized, targeted management.

Data availability

The data underlying this article cannot be shared publicly due to the multi-institutional data sharing agreement and institutional review board (IRB) constraints. To the extent allowed by the data sharing agreements and IRB protocols, the data from this manuscript will be shared upon written and reasonable request to the corresponding author.

Abbreviations

CAD:: Coronary artery disease
CCTA:: Coronary computed tomography angiography
MACE:: Major adverse cardiac events
MPI:: Myocardial perfusion imaging
MRI:: Magnetic resonance imaging
REFINE SPECT:: REgistry of Fast Myocardial Perfusion Imaging with NExt generation SPECT
SPECT :: Single-photon emission computed tomography

References

Maron DJ, Hochman JS, Reynolds HR, Bangalore S, O’Brien SM, Boden WE, et al. Initial invasive or conservative strategy for stable coronary disease. N Engl J Med. 2020;382:1395–407. https://doi.org/10.1056/NEJMoa1915922.
Article PubMed PubMed Central Google Scholar
Writing Committee M, Gulati M, Levy PD, Mukherjee D, Amsterdam E, Bhatt DL, et al. 2021 AHA/ACC/ASE/CHEST/SAEM/SCCT/SCMR guideline for the evaluation and diagnosis of chest pain: a report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J Am Coll Cardiol. 2021;78:e187–285. https://doi.org/10.1016/j.jacc.2021.07.053.
Article Google Scholar
Otaki Y, Betancur J, Sharir T, Hu LH, Gransar H, Liang JX, et al. 5-year prognostic value of quantitative versus visual MPI in subtle perfusion defects: results from REFINE SPECT. JACC Cardiovasc Imaging. 2020;13:774–85. https://doi.org/10.1016/j.jcmg.2019.02.028.
Article PubMed Google Scholar
Slomka PJ, Betancur J, Liang JX, Otaki Y, Hu LH, Sharir T, et al. Rationale and design of the REgistry of Fast Myocardial Perfusion Imaging with NExt generation SPECT (REFINE SPECT). J Nucl Cardiol. 2020;27:1010–21. https://doi.org/10.1007/s12350-018-1326-4.
Article PubMed Google Scholar
Hu L-H, Miller RJH, Sharir T, Commandeur F, Rios R, Einstein AJ, et al. Prognostically safe stress-only single-photon emission computed tomography myocardial perfusion imaging guided by machine learning: report from REFINE SPECT. European Heart Journal - Cardiovascular Imaging. 2021;22:705–14. https://doi.org/10.1093/ehjci/jeaa134.
Article PubMed Google Scholar
Rios R, Miller RJH, Hu LH, Otaki Y, Singh A, Diniz M, et al. Determining a minimum set of variables for machine learning cardiovascular event prediction: results from REFINE SPECT registry. Cardiovasc Res. 2021. https://doi.org/10.1093/cvr/cvab236.
Article PubMed Central Google Scholar
Hu LH, Betancur J, Sharir T, Einstein AJ, Bokhari S, Fish MB, et al. Machine learning predicts per-vessel early coronary revascularization after fast myocardial perfusion SPECT: results from multicentre REFINE SPECT registry. Eur Heart J Cardiovasc Imaging. 2020;1:549–59. https://doi.org/10.1093/ehjci/jez177.
Article Google Scholar
Cho Jung S, Shrestha S, Kagiyama N, Hu L, Ghaffar Yasir A, Casaclang-Verzosa G, et al. A network-based “phenomics” approach for discovering patient subtypes from high-throughput cardiac imaging data. JACC: Cardiovascular Imaging. 2020;13:1655–70. https://doi.org/10.1016/j.jcmg.2020.02.008.
Pezel T, Unterseeh T, Hovasse T, Asselin A, Lefevre T, Chevalier B, et al. Phenotypic clustering of patients with newly diagnosed coronary artery disease using cardiovascular magnetic resonance and coronary computed tomography angiography. Front Cardiovasc Med. 2021;8:760120. https://doi.org/10.3389/fcvm.2021.760120.
Yoon YE, Baskaran L, Lee BC, Pandey MK, Goebel B, Lee S-E, et al. Differential progression of coronary atherosclerosis according to plaque composition: a cluster analysis of PARADIGM registry data. Scientific reports. 2021;11:17121-. https://doi.org/10.1038/s41598-021-96616-w.
Kaasenbrood L, Bhatt DL, Dorresteijn JAN, Wilson PWF, D’Agostino RB, Sr., Massaro JM, et al. Estimated life expectancy without recurrent cardiovascular events in patients with vascular disease: the SMART-REACH model. J Am Heart Assoc. 2018;7:e009217. https://doi.org/10.1161/jaha.118.009217.
Slomka PJ, Nishina H, Berman DS, Akincioglu C, Abidov A, Friedman JD, et al. Automated quantification of myocardial perfusion SPECT using simplified normal limits. J Nucl Cardiol. 2005;12:66–77. https://doi.org/10.1016/j.nuclcard.2004.10.006.
Article PubMed Google Scholar
Germano G, Kavanagh PB, Slomka PJ, Van Kriekinge SD, Pollard G, Berman DS. Quantitation in gated perfusion SPECT imaging: the Cedars-Sinai approach. J Nucl Cardiol. 2007;14:433–54. https://doi.org/10.1016/j.nuclcard.2007.06.008.
Article PubMed Google Scholar
McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv. 2018.
Aggarwal CC, Hinneburg A, Keim DA. On the surprising behavior of distance metrics in high dimensional space. Database Theory—ICDT 2001: 8th International Conference London, UK, January 4–6, 2001 Proceedings 8: Springer; 2001. p. 420–34.
Bruns S, Wolterink JM, Takx RAP, van Hamersvelt RW, Sucha D, Viergever MA, et al. Deep learning from dual-energy information for whole-heart segmentation in dual-energy and single-energy non-contrast-enhanced cardiac CT. Med Phys. 2020;47:5048–60. https://doi.org/10.1002/mp.14451.
Article CAS PubMed Google Scholar
Cao J, Spielmann M, Qiu X, Huang X, Ibrahim DM, Hill AJ, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature. 2019;566:496–502.
Article CAS PubMed PubMed Central Google Scholar
Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.
Article Google Scholar
Lê S, Josse J, Husson F. FactoMineR: An R package for multivariate analysis. J Stat Softw. 2008;25:1–18. https://doi.org/10.18637/jss.v025.i01.
Article Google Scholar
Heidari S, Babor TF, De Castro P, Tort S, Curno M. Sex and gender equity in research: rationale for the SAGER guidelines and recommended use. Res Integ Peer Rev. 2016;1:1–9.
Google Scholar
Klein E, Miller RJH,Sharir T, Einstein AJ, Fish MB, Ruddy TD, et al. Automated quantitative analysis of CZT SPECT stratifies cardiovascular risk in the obese population: analysis of the REFINE SPECT registry. J Nucl Cardiol. 2020. https://doi.org/10.1007/s12350-020-02334-7
Han D, Rozanski A, Gransar H, Sharir T, Einstein AJ, Fish MB, et al. Myocardial ischemic burden and differences in prognosis among patients with and without diabetes: results from the multicenter international REFINE SPECT Registry. Diabetes Care. 2020;43:453–9. https://doi.org/10.2337/dc19-1360.
Article PubMed Google Scholar
Miller RJH, Sharir T, Otaki Y, Gransar H, Liang JX, Einstein AJ, et al. Quantitation of poststress change in ventricular morphology improves risk stratification. J Nucl Med. 2021;62:1582–90. https://doi.org/10.2967/jnumed.120.260141.
Article PubMed PubMed Central Google Scholar
Miller RJH, Hu LH, Gransar H, Betancur J, Eisenberg E, Otaki Y, et al. Transient ischaemic dilation and post-stress wall motion abnormality increase risk in patients with less than moderate ischaemia: analysis of the REFINE SPECT registry. Eur Heart J Cardiovasc Imaging. 2020;21:567–75. https://doi.org/10.1093/ehjci/jez172.
Article CAS PubMed Google Scholar
Wojnarski CM, Roselli EE, Idrees JJ, Zhu Y, Carnes TA, Lowry AM, et al. Machine-learning phenotypic classification of bicuspid aortopathy. J Thorac Cardiovasc Surg. 2018;155:461-9.e4.
Article PubMed Google Scholar
Zheng Q, Delingette H, Fung K, Petersen SE, Ayache N. Pathological cluster identification by unsupervised analysis in 3,822 UK biobank cardiac MRIs. Frontiers in Cardiovascular Medicine. 2020;7. https://doi.org/10.3389/fcvm.2020.539788.
Tang PC, Haft JW, Romano MA, Bitar A, Hasan R, Palardy M, et al. Cluster analysis of preoperative echocardiographic findings and outcomes following left ventricular device implantation. J Thorac Cardiovasc Surg. 2019;157:1851-60.e1. https://doi.org/10.1016/j.jtcvs.2018.11.099.
Article PubMed Google Scholar
Kwak S, Lee Y, Ko T, Yang S, Hwang I-C, Park J-B, et al. Unsupervised cluster analysis of patients with aortic stenosis reveals distinct population with different phenotypes and outcomes. Circulation: Cardiovascular Imaging. 2020;13:e009707. https://doi.org/10.1161/CIRCIMAGING.119.009707.
Nagamine T, Gillette B, Pakhomov A, Kahoun J, Mayer H, Burghaus R, et al. Multiscale classification of heart failure phenotypes by unsupervised clustering of unstructured electronic medical record data. Sci Rep. 2020;10:21340. https://doi.org/10.1038/s41598-020-77286-6.
Article CAS PubMed PubMed Central Google Scholar
Guo Q, Lu X, Gao Y, Zhang J, Yan B, Su D, et al. Cluster analysis: a new approach for identification of underlying risk factors for coronary artery disease in essential hypertensive patients. Sci Rep. 2017;7:43965. https://doi.org/10.1038/srep43965.
Article PubMed PubMed Central Google Scholar
Sharma A, Zheng Y, Ezekowitz JA, Westerhout CM, Udell JA, Goodman SG, et al. Cluster analysis of cardiovascular phenotypes in patients with type 2 diabetes and established atherosclerotic cardiovascular disease: a potential approach to precision medicine. Diabetes Care. 2022;45:204–12. https://doi.org/10.2337/dc20-2806.
Article CAS PubMed Google Scholar
Bonnefous L, Kharoubi M, Bézard M, Oghina S, Le Bras F, Poullot E, et al. Assessing cardiac amyloidosis subtypes by unsupervised phenotype clustering analysis. J Am Coll Cardiol. 2021;78:2177–92. https://doi.org/10.1016/j.jacc.2021.09.858.
Article PubMed Google Scholar

Download references

Funding

Open access funding provided by SCELC, Statewide California Electronic Library Consortium. This research was supported in part by grants R01HL089765 and R35HL161195 from the National Heart, Lung, and Blood Institute/ National Institutes of Health (NHLBI/NIH) (PI: Piotr Slomka). MCW (FS/ICRF/20/26002) was supported by the British Heart Foundation.

Author information

Michelle C. Williams and Bryan P. Bednarski contributed equally to this work.

Authors and Affiliations

Departments of Medicine (Division of Artificial Intelligence in Medicine), Biomedical Sciences, and Imaging, Cedars-Sinai Medical Center, 8700 Beverly Boulevard, Ste. Metro 203, Los Angeles, CA, 90048, USA
Michelle C. Williams, Bryan P. Bednarski, Konrad Pieszko, Robert J. H. Miller, Jacek Kwiecinski, Aakash Shanbhag, Joanna X. Liang, Cathleen Huang, Damini Dey, Daniel S. Berman & Piotr J. Slomka
British Heart Foundation Centre for Cardiovascular Science, University of Edinburgh, Edinburgh, UK
Michelle C. Williams
Department of Cardiac Sciences, University of Calgary, Calgary, AB, Canada
Robert J. H. Miller
Department of Interventional Cardiology and Angiology, Institute of Cardiology, Warsaw, Poland
Jacek Kwiecinski
Department of Nuclear Cardiology, Assuta Medical Centers, Tel Aviv, and Ben Gurion University of the Negev, Beer Sheba, Israel
Tali Sharir
Department of Radiology, Division of Nuclear Medicine and Molecular Imaging, Brigham and Women’s Hospital, Boston, MA, USA
Sharmila Dorbala & Marcelo F. Di Carli
Division of Cardiology, Department of Medicine, and Department of Radiology, Columbia University Irving Medical Center and New York-Presbyterian Hospital, New York, NY, USA
Andrew J. Einstein
Section of Cardiovascular Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
Albert J. Sinusas & Edward J. Miller
Cardiovascular Imaging Technologies LLC, Kansas City, MO, USA
Timothy M. Bateman
Oregon Heart and Vascular Institute, Sacred Heart Medical Center, Springfield, OR, USA
Mathews B. Fish
Division of Cardiology, University of Ottawa Heart Institute, Ottawa, ON, Canada
Terrence D. Ruddy
Department of Advanced Biomedical Sciences, University of Naples “Federico II”, Naples, Italy
Wanda Acampa
Department of Nuclear Cardiology, Oklahoma Heart Hospital, Oklahoma City, OK, USA
M. Timothy Hauser
Department of Nuclear Medicine, Cardiac Imaging, University Hospital Zurich, Zurich, Switzerland
Philipp A. Kaufmann

Authors

Michelle C. Williams
View author publications
You can also search for this author in PubMed Google Scholar
Bryan P. Bednarski
View author publications
You can also search for this author in PubMed Google Scholar
Konrad Pieszko
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. H. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Kwiecinski
View author publications
You can also search for this author in PubMed Google Scholar
Aakash Shanbhag
View author publications
You can also search for this author in PubMed Google Scholar
Joanna X. Liang
View author publications
You can also search for this author in PubMed Google Scholar
Cathleen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tali Sharir
View author publications
You can also search for this author in PubMed Google Scholar
Sharmila Dorbala
View author publications
You can also search for this author in PubMed Google Scholar
Marcelo F. Di Carli
View author publications
You can also search for this author in PubMed Google Scholar
Andrew J. Einstein
View author publications
You can also search for this author in PubMed Google Scholar
Albert J. Sinusas
View author publications
You can also search for this author in PubMed Google Scholar
Edward J. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Timothy M. Bateman
View author publications
You can also search for this author in PubMed Google Scholar
Mathews B. Fish
View author publications
You can also search for this author in PubMed Google Scholar
Terrence D. Ruddy
View author publications
You can also search for this author in PubMed Google Scholar
Wanda Acampa
View author publications
You can also search for this author in PubMed Google Scholar
M. Timothy Hauser
View author publications
You can also search for this author in PubMed Google Scholar
Philipp A. Kaufmann
View author publications
You can also search for this author in PubMed Google Scholar
Damini Dey
View author publications
You can also search for this author in PubMed Google Scholar
Daniel S. Berman
View author publications
You can also search for this author in PubMed Google Scholar
Piotr J. Slomka
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MCW, BB, and PJS conceptualized and designed this study. BB developed and tested the data processing and machine learning pipeline. BB validated the results. MCW, BB, and PJS drafted the manuscript, which was critically reviewed and evaluated by all other remaining coauthors. Cedars-Sinai Medical Center was the coordinating center for the study and the core lab unsupervised learning method development and analysis. Authors from this institution (BB, KP, RJM, AS, JXL, CH, DD, DSB, and PJS) verified the raw underlying data from the paticipating sites. All authors read and approved the final manuscript and had final responsibility for the decision to submit for publication.

Corresponding author

Correspondence to Piotr J. Slomka.

Ethics declarations

Ethics approval

The study complied with the Declaration of Helsinki and was approved by the institutional review boards of local sites and Cedars-Sinai Medical Center.

Consent to participate

Informed consent was obtained from all individual participants included in the study.

Consent for publication

This manuscript presents no individual participant data that would require additional content to publish this work beyond the informed consent that was obtained prior to data collection.

Competing interests

Dr. Williams has given lectures for Canon Medical Systems and Siemens Healthineers. Dr. Robert Miller has received research funding and consulting fees from Pfizer. Drs. Berman and Slomka participate in software royalties for the QPS software at Cedars-Sinai Medical Center. Dr. Slomka has received research grant support from Siemens Medical Systems and has received consulting fees from Synektik SA. Dr. Berman serves as a consultant for GE Healthcare. Drs. Dorbala and Edward Miller have served as consultants for GE Healthcare. Dr. Dorbala has served as a consultant to Bracco Diagnostics; her institution has received grant support from Astellas. Dr. Di Carli has received research grant support from Spectrum Dynamics and consulting honoraria from Sanofi and GE Healthcare. Dr. Ruddy has received research grant support from GE Healthcare and Advanced Accelerator Applications. Dr. Edward Miller has served as a consultant for Bracco Inc; he and his institution has received grant support from Bracco Inc. Dr. Berman’s institution has received grant support from HeartFlow. Dr. Einstein has received a speaker’s fee from Ionetix, consulting fees from W. L. Gore & Associates, and authorship fees from Wolters Kluwer Healthcare; institution has grants/grants pending from Attralus, Canon Medical Systems, Eidos Therapeutics, GE Healthcare, Pfizer, Roche Medical Systems, W. L. Gore & Associates, and XyloCor Therapeutics. The remaining authors have nothing to disclose.

Disclaimer

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Cardiology.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 868 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Williams, M.C., Bednarski, B.P., Pieszko, K. et al. Unsupervised learning to characterize patients with known coronary artery disease undergoing myocardial perfusion imaging. Eur J Nucl Med Mol Imaging 50, 2656–2668 (2023). https://doi.org/10.1007/s00259-023-06218-z

Download citation

Received: 05 January 2023
Accepted: 29 March 2023
Published: 17 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00259-023-06218-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Unsupervised learning to characterize patients with known coronary artery disease undergoing myocardial perfusion imaging

Abstract

Purpose

Methods

Results

Conclusions

Similar content being viewed by others

Machine learning-based diagnosis and risk classification of coronary artery disease using myocardial perfusion imaging SPECT: A radiomics study

Prediction of revascularization after myocardial perfusion SPECT by machine learning in a large population

Machine learning in the integration of simple variables for identifying patients with myocardial ischemia

Introduction

Materials and methods

Study design

Study population

Clinical information

SPECT MPI

Data preprocessing

Clinical outcomes

Unsupervised machine learning

Statistical analysis

Results

Study population

Unsupervised machine learning

Clinical phenotypes of clusters

Imaging phenotypes of clusters

Cardiovascular outcomes

Discussion

Study limitations

Conclusion

Data availability

Abbreviations

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Competing interests

Disclaimer

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (PDF 868 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation