Main

Cardiovascular disease is the leading cause of death in women, but sex-specific aspects of the risk of heart disease and acute myocardial infarction (AMI) remain understudied1. Spontaneous coronary artery dissection (SCAD) and atherosclerotic coronary artery disease (CAD) are both causes of acute coronary syndromes leading to AMI2,3,4,5,6. However, in contrast with CAD, SCAD affects a younger, predominantly female population7 and arises from the development of a hematoma, leading to dissection of the coronary tunica media with the eventual formation of a false lumen, rather than atherosclerotic plaque erosion or rupture8. SCAD has been clinically associated with migraine9 and extra-coronary arteriopathies, including fibromuscular dysplasia (FMD)10,11,12,13. However, co-existent coronary atherosclerosis is uncommon8,14. While the genetic basis of CAD is increasingly well established15, the pathophysiology of SCAD remains poorly understood4. The search for highly penetrant mutations in candidate pathways or by sequencing has garnered a low yield, often pointing to genes involved in other clinically undiagnosed inherited syndromes manifesting as SCAD16. Previous investigations of the impact of common genetic variation on the risk of SCAD have described five confirmed risk loci17,18,19,20.

In this Article, we performed a meta-analysis of genome-wide association studies (GWASs) comprising 1,917 SCAD cases and 9,292 controls of European ancestry. We identified 16 risk loci, including 11 new association signals, demonstrating a substantial polygenic heritability for this disease. Importantly, we show that several common genetic risk loci for SCAD are shared with CAD but have a directionally opposite effect and a different genetic contribution of established cardiovascular risk factors. These findings implicate arterial integrity related to extracellular matrix biology, vascular tone and tissue coagulation in the pathophysiology of SCAD.

Results

GWAS meta-analysis and single-nucleotide polymorphism heritability

We conducted a GWAS meta-analysis of eight independent case–control studies (Supplementary Figs. 1 and 2 and Supplementary Table 1). Sixteen loci demonstrated genome-wide-significant signals of association with SCAD, among which 11 were newly described for this disease (Table 1, Fig. 1a, Supplementary Table 2 and Supplementary Fig. 3). One locus on chromosome 4 (AFAP1) was recently reported for SCAD in the context of pregnancy19 and has now been confirmed as being generally involved in SCAD (Table 1). The estimated odds ratios of associated loci ranged from 1.25 (95% confidence interval (CI) = 1.16–1.35) in ZNF827 on chromosome 4 to 2.04 (95% CI = 1.77–2.35) on chromosome 21 near KCNE2 (Table 1). We report evidence for substantial polygenicity for SCAD with an estimated single-nucleotide polymorphism (SNP)-based heritability above 0.70 (h2SNP = 0.71 ± 0.11 on the liability scale using linkage disequilibrium score regression21 and h2SNP = 0.70 ± 0.12 using SumHer22; Supplementary Table 3). The ECM1/ADAMTSL4 locus on chromosome 1 accounted for the largest proportion of heritability for SCAD in our dataset (h2 = 0.028), followed by the COL4A1/COL4A2 locus, which contained two independent GWAS signals (h2 = 0.022; Supplementary Table 4 and Supplementary Fig. 4). Overall, we estimate that the 16 loci explain 24% of the total SNP-based heritability of SCAD (Supplementary Table 4).

Table 1 Lead associated variants at genome-wide significance in SCAD loci
Fig. 1: GWAS meta-analysis main association results and gene prioritization at-risk loci.
figure 1

a, Manhattan plot representation of SNP-based association meta-analysis in SCAD. The x axis shows the genomic coordinates and the y axis shows the −log10[P value] obtained by two-sided Wald test. SNPs located around genome-wide significant signals (±500 kb) are highlighted. The labels show the rsIDs for the lead SNPs, with newly identified loci in red and previously known loci in black. The dashed red line represents genome-wide significance (P = 5 × 10−8) and the gray line suggestive association (P = 10−5). b, Summary of the strategy for the annotation of gene prioritization. The dots indicate genes fulfilling one of the following eight criteria: (1) colocalization of SCAD association signal and eQTL association in the aorta, coronary artery, tibial artery, fibroblasts or whole blood samples (GTEx version 8 release); (2) a TWAS hit in any of the above-mentioned tissues; (3) a cardiovascular (CV) phenotype in the gene knockout mouse; (4) existing evidence of gene function in cardiovascular disease (CVD) pathophysiology in humans; (5) the gene is an eGene for a nearby lead SNP in the above-mentioned GTEx tissues; (6) Hi-C evidence25 for a promoter of the gene in a chromatin loop from human aorta tissue that includes variants from the credible set of causal variants; (7) the closest gene upstream or downstream from the lead SNP; or (8) variants in the credible set of causal variants map in the gene. Criteria 1 and 2 (blue dots) were given a tenfold weighted score over criteria 3–8. Genes with the most criteria were prioritized in each locus and are shown here.

Functional annotation of variants in SCAD loci

We found SCAD-associated variants to be significantly enriched in enhancer marks specific to gene expression in arterial tissues from ENCODE23 (for example, the aorta, tibial artery, thoracic aorta and coronary artery), as well as several tissues rich for smooth muscle cells (for example, the colon, small intestine and uterus) (Supplementary Fig. 5). Based on recently published analyses of single-cell open chromatin in 30 adult tissues24, we determined that vascular smooth muscle cells (VSMCs) and fibroblasts were the top enriched cell types for SCAD-associated loci among clusters represented in aorta and tibial artery datasets (Fig. 2a and Supplementary Fig. 6). Consistently, all but one SCAD locus included at least one variant that overlapped with enhancer marks or open chromatin peaks in coronary artery tissue, VSMCs or fibroblasts (Supplementary Fig. 7 and Supplementary Table 5). Among the top associated variants for SCAD, 14 were expression quantitative trait loci (eQTLs) for nearby genes in the aorta, coronary or tibial artery, whole blood or cultured fibroblasts (Fig. 1b and Supplementary Table 5).

Fig. 2: Enrichment of SCAD SNPs in open chromatin regions from arterial cells and genetically predicted expression changes of nearby genes.
figure 2

a, Top, representation of the fold-enrichment of SCAD SNPs (top y axis) and enrichment P value (log scale; bottom y axis) among the open chromatin regions of seven single-cell subclusters contributing to >1% of cells in artery tissue24. The SCAD 95% credible set of causal SNPs and their linkage disequilibrium proxies were matched to random pools of neighboring SNPs using the GREGOR package43. Enrichment represents the ratio of the number of SCAD SNPs overlapping open chromatin regions over the average number of matched SNPs overlapping the same regions. P values were evaluated by binomial one-sided test, with greater enrichment as the alternative hypothesis43. The bottom dashed line represents significance (P < 0.05) after adjustment for 105 subclusters. Higher opacity is used to identify significant associations (adjusted P < 0.05). Bottom, composition of artery tissues relative to 105 single-cell subclusters, as determined by snATAC-seq in 30 adult tissues24. Only subclusters representing >1% of cells from either the aorta or tibial artery were represented. b, Representation of the SCAD TWAS z score for each prioritized gene in GWAS loci. The point shape indicates the tissue used in the TWAS association. The point color distinguishes genes located at different loci. The absence of a symbol indicates that the gene did not show significant heritability based on the eQTL data in the corresponding tissue. TWAS P values were calculated by two-tailed z test against a null distribution calculated by permutation for each gene or tissue44. Higher opacity is used to identify significant associations (Bonferroni adjusted P < 0.05), corresponding to a z score of >4.8 or <−4.8 (dashed gray lines).

Tissue coagulation as a novel mechanism in SCAD

We applied a multi-source strategy to identify candidate genes located in risk or GWAS loci, or loci at risk for SCAD. We prioritized: (1) genes that were targets of eQTLs colocalizing with a GWAS signal (Supplementary Fig. 8a and Supplementary Table 6) or transcriptome-wide association study (TWAS) hits in at least one tissue relevant to arterial dissection (aorta, coronary or tibial artery, fibroblasts or whole blood from the Genotype Tissue Expression (GTEx) database) (Supplementary Fig. 8b and Supplementary Table 7); (2) genes with a biological function linked to the cardiovascular system in humans or mice; (3) genes involved in significant long-range chromatin conformation interactions from Hi-C data with SCAD-associated variants in the aorta25; and (4) those genes closest to or overlapping with the top associated variants. We identified one specific and strong candidate gene in 14 loci (Fig. 1b). For instance, the tissue factor gene F3 stood out as the most likely target gene near rs1146473 (odds ratio = 1.32; P = 5.8 × 10−9)—a locus on chromosome 1 that we describe as novel for SCAD and any cardiovascular disease or trait so far. F3 is the closest coding gene to the association signal and was a TWAS hit in artery tissue (Supplementary Table 7). In addition, the rs1146473 risk allele for SCAD confidently (posterior probability = 94%) colocalized with an eQTL signal of F3 in the aorta, supporting the genetic risk to potentially be the result of decreased F3 expression in arteries (Fig. 2b and Supplementary Table 6). Tissue factor, also known as coagulation factor III, forms a complex with factor VIIa, which is the primary initiator of blood coagulation. Hence, reduced factor III expression is potentially a key biological mechanism contributing to hematoma formation in the coronary arteries of SCAD survivors. Consideration of genes encoding druggable targets, as derived by Finan et al.26, indicated that tissue factor is a clinical phase drug candidate (tier 1 druggable target), with target reference numbers CHEMBL4081 (factor III) and CHEMBL2095194 (factor III/factor VII complex) (Supplementary Table 8).

To globally assess the biological mechanisms involving prioritized genes, we applied a network query based on Bayesian gene regulatory networks constructed from expression and genetics data from arterial tissues and fibroblasts27,28,29. We found extracellular matrix organization to be the biological function at which most prioritized genes and their respective immediate subnetworks clustered (Supplementary Fig. 9). Among the genes we prioritized in novel loci, a number encode proteins involved in extracellular matrix formation, including integrin alpha 1 (ITGA1), basement membrane constituent collagen type IV alpha 1 chain (COL4A1) and alpha 2 chain (COL4A2), serine protease HtrA serine peptidase 1 (HTRA1), metallopeptidase thrombospondin type 1 domain containing 4 (THSD4, encoding a partner of fibrillin 1, whose gene is located in a previously reported SCAD locus (FBN1)) and TIM metallopeptidase inhibitor 3 gene (TIMP3). Interestingly, integrin alpha 1, HTRA1 and collagen type IV subunits were labeled as potentially druggable targets based on their similarity to approved drug targets and members of key druggable gene families (tier 3; Supplementary Table 8). Of note, the F3 subnetwork also clustered in extracellular matrix organization and connected with HTRA1 and TIMP3 subnetworks through Bayesian network edges from the aorta and coronary artery (Supplementary Fig. 9).

Shared genetics between SCAD and arterial diseases

With the exception of the F3 locus, SCAD risk loci located within 1 megabase of the lead SCAD variants were at least suggestively (P < 10−5) associated with other forms of cardiovascular and neurovascular disease. Using trait colocalization analyses, we found that the same variants were likely to be causal both for SCAD and the other diseases or traits at 15 loci (Fig. 3a and Supplementary Table 9). However, the directions of the effects were not systematically consistent across the loci for all of the diseases.

Fig. 3: Colocalization and genetic correlation of SCAD genetic association with cardiovascular diseases and traits.
figure 3

a, Heatmap representing the colocalization of SCAD signals with GWAS analysis of the following cardiovascular diseases or traits: cervical artery dissection (CeAD), multifocal FMD, migraine, blood pressure (SBP and DBP), LDL cholesterol blood concentration, hemoglobin concentration (HGB), any stroke (AS), intracranial aneurysm (IA) and CAD. The tile color represents the H4 coefficient of approximate Bayes factor (ABF) colocalization (that is, the posterior probability of the two traits sharing one causal variant at the locus (PP.H4.ABF; 0–1)) multiplied by the sign of colocalization (+1 if both traits have the same risk or higher mean allele and −1 if opposite allele)). b, Forest plot representing genetic correlations with SCAD. The Rho coefficient of genetic correlation (rg), obtained using linkage disequilibrium score regression, is represented on the x axis (center of the error bar). The range of each bar represents the 95% CI. Unadjusted P values obtained by two-sided Wald test for genetic correlations are indicated. Asterisks indicate significance after Bonferroni correction for testing 26 traits (P < 1.9 × 10−3) (Supplementary Table 10).

Globally, SCAD loci showed evidence for high posterior probability for the same risk alleles to also probably be causal for FMD and cervical artery dissection (Fig. 3a and Supplementary Table 9). Linkage disequilibrium score regression-based genetic correlations indicated that SCAD correlates positively with FMD (rg = 0.38 ± 0.18; P = 0.03) and cervical artery dissection (rg = 0.61 ± 0.20; P = 2.4 × 10−3; Fig. 3b and Supplementary Table 10), which is consistent with the clinical observation of frequent coexistence of these arteriopathies in patients with SCAD. For instance, FMD is reported in 40–60% of patients with SCAD11,30. Stratified analyses in the four largest case–control studies where FMD arteriopathies were screened indicated globally similar associations with SCAD (Supplementary Fig. 10 and Supplementary Table 11). Finally, genetic correlations indicated that SCAD positively correlates with several neurovascular diseases where predominantly arterial structure and/or function are altered, including stroke (rg = 0.17 ± 0.06; P = 4.5 × 10−3), migraine (rg = 0.18 ± 0.06; P = 1.3 × 10−3), intracranial aneurysm (rg = 0.22 ± 0.06; P = 2.0 × 10−4) and subarachnoid hemorrhage (rg = 0.27 ± 0.07; P = 6.4 × 10−5) (Fig. 3b and Supplementary Table 10).

Opposite genetic link between SCAD and CAD

While patients with CAD are predominantly men (75%) who often have pre-existing cardiometabolic comorbidities (mainly dyslipidemia, hypertension and type 2 diabetes), patients with SCAD are on average younger, present with fewer cardiovascular risk factors and are overwhelmingly women (>90%)2,4. Using genetic association colocalization and genetic correlation, we genetically compared SCAD with CAD. We found that, among SCAD loci, several were known to associate with CAD. Disease association colocalization analyses showed that for six loci SCAD and CAD are likely to share the same causal variants with high posterior probabilities (posterior probability of the shared causal variant hypothesis (H4) = 84–100%), but all with opposite risk alleles (Fig. 3a and Supplementary Table 7). Genetic correlation confirmed a genome-wide negative correlation between SCAD and CAD (rg = −0.12 ± 0.04; P = 3.7 × 10−3) (Supplementary Table 10), including after conditioning SCAD GWAS results on systolic blood pressure (SBP) or diastolic blood pressure (DBP) GWAS results using the multitrait-based conditional and joint analysis (mtCOJO) tool31 (rgCAD/SBP = −0.19 ± 0.04 (P = 4.6 × 10−6); rgCAD/DBP = −0.19 ± 0.04 (P = 1.3 × 10−5)) (Supplementary Table 12 and Supplementary Fig. 11).

Cardiovascular risk factors and risk of SCAD and CAD

We found that SCAD shared several causal variants with SBP and DBP, involving both the same and opposite directional effects (Fig. 3a and Supplementary Table 9). We found one shared locus with hemoglobin levels and a significant genetic correlation with SCAD (rg = 0.12 ± 0.03; P = 2.7 × 10−5; Fig. 3b). However, SCAD loci were not shared with body mass index (BMI), lipid traits (including low-density lipoprotein (LDL) cholesterol and high-density lipoprotein (HDL)), type 2 diabetes or smoking, and these traits did not correlate with SCAD at the genomic level (Supplementary Tables 9 and 10). Interestingly, we found significant positive genetic correlations both with SBP (rg = 0.12 ± 0.03; P = 1.0 × 10−4) and DBP (rg = 0.17 ± 0.03; P = 2.6 × 10−7), indicating a shared genetic basis with SCAD (Fig. 3b and Supplementary Table 10). To assess the extent to which blood pressure and main cardiovascular risk factors may contribute to the risk of SCAD, we leveraged existing GWAS datasets to identify instrumental variables and conducted comparative Mendelian randomization associations with SCAD or CAD. We found robust significant associations estimated by inverse variance-weighted (IVW), MR-Egger and weighted median methods between genetically predicted blood pressure traits and increased risk of SCAD (βIVW/SBP = 0.05 ± 0.01 (P = 7.6 × 10−6); βIVW/DBP = 0.10 ± 0.02 (P = 1.9 × 10−8)) and CAD (βIVW/SBP = 0.04 ± 0.002 (P = 8.6 × 10−49); βIVW/DBP = 0.06 ± 0.004 (P = 1.6 × 10−44)) (Fig. 4 and Supplementary Table 13). Similar associations were estimated when we analyzed only women with SCAD, women with CAD or men with CAD, although analyses only in men with SCAD were limited by the extremely small numbers of male cases (Supplementary Table 14). Genetically determined BMI, lipid traits, type 2 diabetes and smoking status did not influence the risk for SCAD. However, we were able to confirm that these cardiometabolic traits are strong genetic risk factors for CAD (Fig. 4 and Supplementary Table 13). Our findings indicate that genetically elevated blood pressure is the only shared genetic risk factor between SCAD and CAD, albeit involving potentially different genetic loci.

Fig. 4: Mendelian randomization associations between main cardiovascular risk factors and SCAD or CAD.
figure 4

a,b, Forest plots representing Mendelian randomization associations between cardiovascular risk factors and SCAD (ncases = 1,917; ncontrols = 9,292) (a) or CAD (ncases = 181,522; ncontrols = 984,168) (b). Association estimates (β; center of the error bars) obtained from Mendelian randomization analyses using the IVW method are represented on the x axis. The range of each bar represents the 95% CI. Unadjusted P values from the associations obtained by two-sided Wald test are indicated. n = 340,159 (SBP), 340,162 (DBP), 359,983 (BMI), 315,133 (HDL), 343,621 (LDL), 343,992 (triglycerides (TG)), 164,638 cases and 195,068 controls (smoking (SMK)) and 74,124 cases and 824,006 controls (type 2 diabetes (T2D)). The asterisks indicate significance after Bonferroni correction for testing nine traits (P < 5.6 × 10−3) (Supplementary Table 13).

Discussion

In this Article, we provide the largest study to date aimed at understanding the genetic basis of SCAD—an understudied cause of AMI that primarily affects women. We report novel associations and demonstrate high polygenic heritability for SCAD. We leverage integrative functional annotations to prioritize genes that are likely to be regulated in VSMCs and the fibroblasts of arteries. Insights from the biological functions of genes highlight the central role of extracellular matrix integrity and reveal impaired tissue coagulation as a novel potential mechanism for SCAD. Globally, we demonstrate the polygenic basis of SCAD to be shared with an important set of cardiovascular diseases. However, a striking directionally opposite genetic impact is found with atherosclerotic CAD, involving multiple risk loci and leading to a genome-wide negative genetic correlation. We provide evidence supporting genetically predicted higher blood pressure as an important risk factor for SCAD, but not other well-established cardiovascular factors. Our results set the stage for future investigation of novel biological pathways relevant to both SCAD and CAD and potential therapeutic and preventive strategies specifically targeting SCAD.

As an understudied condition that was previously thought to be uncommon, SCAD was initially suspected to involve rare and highly penetrant mutations. However, recent sequencing studies have suggested that only a small proportion (~3.5%) of SCAD cases are due to rare variants16,32. This is in keeping with increasing clinical recognition suggesting that this condition is not rare and occurs globally in populations of both European and non-European ancestry, with similar disease characteristics and probably similar prevalence2,4,33,34. Despite a modest sample size, we identified 16 risk loci accounting for about one-quarter of the polygenic heritability, which we estimate to be as high as 71%, therefore indicating that SCAD is predominantly a complex polygenic disease. However, we acknowledge that larger GWAS settings, including ancestrally diverse populations, will enhance the statistical power needed to provide validation through replication of the reported risk loci and estimated polygenic heritability.

This study supports the presence of genetic overlap between the risk of SCAD and other vascular diseases involving generally younger individuals and more women, such as cervical arterial dissection, migraine, subarachnoid hemorrhage and FMD. These conditions are reported to occur at increased frequency in patients with SCAD10,11,12,13, supporting shared causal biological mechanisms. Among the genes we prioritize as novel SCAD loci, we highlight the ATPase plasma membrane Ca2+ transporting 1 gene (ATP2B1) that we recently reported to associate with FMD35—a well-established locus for blood pressure risk36 via its role in intracellular calcium homeostasis in VSMCs and blood pressure regulation37. Most importantly, we provide evidence for a causal genetic effect of both SBP and DBP in SCAD risk. These findings provide an important genetic basis to support observational data suggesting that control of blood pressure may be an important factor in reducing the risk of recurrence after SCAD38. However, our findings also suggest that controlling other causal risk factors for CAD, such as LDL cholesterol with statins, may confer less benefit in SCAD than in CAD.

Knowledge of the molecular mechanisms leading to SCAD has been limited. Insights from sequencing studies of rare genetic variants have shown that most are associated with genes known from hereditary connective tissue disorders such as vascular Ehlers–Danlos, Loeys–Dietz and Marfan syndromes, as well as adult polycystic kidney disease16,32. A striking finding from our study is the identification of the tissue factor gene F3—a critical component of tissue-mediated blood coagulation—as a strong candidate gene in a risk locus for SCAD. We found that genetically determined lower expression of F3 in arterial tissue was associated with a higher risk for SCAD, involving variants located in putative functional regulatory elements in the coronary artery, VSMCs and fibroblasts. Tissue factor is synthesized at the subendothelial level of VSMCs and by fibroblasts in the adventitia surrounding the arteries39. In SCAD, once an intramural hemorrhage has initiated, propagation and pressurization of the false lumen may depend, in part, on coagulation and stabilization of the hematoma. Tissue factor is also a druggable target, albeit a potentially challenging one given its known multiple physiological and pathophysiological roles ranging from hemostasis to cancer metastasis. Tissue factor is widely studied in the context of prothrombotic conditions, including atherosclerosis, although notably the genetic variants we describe here do not associate with atherosclerotic disease. This feature is an exception to the highly pleiotropic nature of the variants we describe in the remaining SCAD loci, suggesting impaired tissue-initiated coagulation as a putative specific mechanism in SCAD.

We identify regulation of the extracellular matrix of arteries as the predominant polygenic biological mechanism for SCAD. Integrative prioritization analyses revealed 13 potential causal genes with established key roles in maintaining arterial wall integrity and function. Among these, we highlight the serine protease HTRA1 and metallopeptidase inhibitor TIMP3, which are involved in matrix disassembly. TIMP3 clusters in the main network for extracellular matrix organization that includes ADAMTSL4, LRP1 and COL4A1, with connections with subnetworks of F3. This clustering is consistent with the biological function of TIMP3 as an inhibitor of matrix metalloproteinases with domains interacting with ADAMTS proteins and LRP1, involving proteins encoded by genes prioritized in SCAD loci40. Interestingly, we found a novel association signal with SCAD in the metallopeptidase thrombospondin type 1 domain containing 4 gene (THSD4) that promotes fibrillin 1 elastic fiber assembly, and confirm the previously reported associations near ADAMTSL4 and FBN1 (refs. 18,20). We showed that genetically decreased expressions of these genes in arteries were correlated with higher SCAD risk alleles in arteries or fibroblasts. This finding suggests that a genetic predisposition to a weaker extracellular matrix may increase the vulnerability of traversing intramural microvessels to disruption, increasing the risk of initiation and propagation of a false lumen within the coronary vessel wall, leading to SCAD.

Many of the risk loci for SCAD that we report here, as well as their prioritized genes, are already known from atherosclerotic disease GWASs. However, here we provide compelling and intriguing evidence for the opposite directionality of a substantial fraction of genetic bases for SCAD versus CAD, suggesting that some key biological mechanisms involved in the two diseases are also likely to be opposite, which is consistent with the clinical observation of a lower-than-expected burden of atherosclerotic disease in patients with SCAD. For example, the association signals in the COL4A1/COL4A2 locus are in an opposite direction to their contribution to CAD41. This locus encodes α1 and α2 chains of type IV collagen, with transcripts generated through a common promoter. Type IV collagen is the main component of the basement membrane of arterial cells and plays a key role in the structural integrity and biological functions of VSMCs in the tunica muscularis. Decreased collagen IV expression increases the risk of CAD15,42. Proposed potential mechanisms for this include a disinhibition of VSMC-intimal migration during atherogenesis or an increase in the vulnerability of atherosclerotic plaque to rupture42. In contrast with CAD, our data indicate that genetically mediated increased collagen IV expression also increases the risk of SCAD. Better understanding of how these directionally opposite changes modify the risk of CAD and SCAD has considerable potential to enhance our understanding of the molecular genetic mechanisms that confer risk in both diseases.

Methods

Patients and control populations

Our meta-analysis included participants of European ancestry from eight studies: DISCO-3C, SCAD-UK I, SCAD-UK II, Mayo Clinic, DEFINE-SCAD, CanSCAD/MGI, VCCRI I and VCCRI II (Supplementary Fig. 1). Patients with SCAD presented with similar clinical characteristics (Supplementary Table 1), as well as homogeneous diagnosis, exclusion and inclusion criteria. All of the studies were approved by national and/or institutional ethical review boards. Further study-specific clinical details are provided in the Supplementary Note.

Genome-wide association meta-analysis

Details of the pre-imputation quality control steps for each study are listed in Supplementary Table 15. Briefly, genotyping was performed using commercially available arrays or genome sequencing (SCAD-UK II and VCCRI II). To increase the number of tested SNPs and the overlap of variants available for analysis between different arrays, the genotypes of all European ancestry cohorts except SCAD-UK II and VCCRI II were imputed to the Haplotype Reference Consortium version 1.1 reference panel45 on the Michigan Imputation Server46. A GWAS was conducted in each study under an additive genetic model using PLINK version 2.0 (ref. 47). For chromosome X, males and females were both on a 0.2 scale under the chromosome X inactivation assumption model. Models were adjusted for population structure using residues from the first five principal components and sex, except in the women-only analyses. Before meta-analysis, we removed SNPs with low minor allele frequencies (<0.01), low imputation quality (r2 < 0.8) and deviations from Hardy–Weinberg equilibrium (P < 10−5). A total of 6,691,677 variants met these criteria and were kept in the final results.

Results from individual GWASs were combined using an inverse variance-weighted fixed-effects meta-analysis in METAL software48, with correction for genomic control. Heterogeneity was assessed using the I2 metric from the complete study-level meta-analysis. Between-study heterogeneity was tested using Cochran’s Q statistic and considered significant at P ≤ 10−3. The genome-wide significance threshold was set at the level of P = 5.0 × 10−8. LocusZoom (http://locuszoom.org/) was used to provide regional visualization of the results.

Functional annotation

Identification of potential functional variants

To generate a list of potential functional variants, we first identified the 95% credible set of variants using the ppfunc function of the corrcoverage R package (version 1.2.1). The posterior probability of causality was evaluated from marginal z scores for all variants within 500 kilobases (kb) of the lead SNP at each locus. In the COL4A1/COL4A2 locus, where we found two association signals, these were separated by placing an equidistant border from each lead SNP for the inclusion of SNPs in the analysis. Variants with a cumulated posterior probability of up to 95% were kept for further analyses. To consider potentially poorly imputed variants in one of the individual case–control studies, we also included variants in high linkage disequilibrium (r2 > 0.7) with the lead SNP at each locus, based on information from European populations (1000 Genomes reference panel) queried using the ldproxy function of the LDlinkR package (version 1.1.2)49.

Enrichment of SCAD variants in regulatory regions

To calculate the enrichment of SCAD-associated SNPs among functionally annotated genomic regions, we retrieved available H3K27ac chromatin immunoprecipitation followed by sequencing (ChIP-seq) datasets (narrowPeak beds) in any tissue from ENCODE (https://www.encodeproject.org/ (ref. 50)) and single-nucleus assay for transposase-accessible chromatin with sequencing (snATAC-seq) peak files (bed format) from the Human Enhancer Atlas (http://catlas.org/humanenhancer (ref. 24)). A complete list of datasets is available in Supplementary Table 16. For H3K27ac marks, bed files corresponding to the same tissue were concatenated and sorted before combining overlapping peaks using the bedtools (version 2.29.0) merge command. Variant enrichment was calculated using the GREGOR package (version 1.4.0)43. All potential functional variants (95% credible set and linkage disequilibrium proxies as described above) were used as inputs and the parameters were adjusted so as not to pick additional linkage disequilibrium proxies (LDWINDOWSIZE = 1). P values were adjusted for multiple testing by the application of Bonferroni correction.

Identification of variants with potential regulatory function

We used H3K27ac peaks in coronary arteries (as described above), open chromatin regions in healthy coronary arteries (obtained as previously described35,51) and open chromatin regions from merged snATAC-seq clusters, which were mapped fragments from snATAC-seq in 25 adult tissues that we retrieved from the Gene Expression Omnibus (GSE184462)24 in bed format. Mapped fragments from all clusters representing >1% of cells in at least one arterial tissue (T lymphocyte 1, CD8+, endothelial general 2, endothelial general 1, macrophage general, fibroblast general, vascular smooth muscle 2 or vascular smooth muscle 1) were extracted and grouped by annotated cell type as T lymphocytes, macrophages, fibroblasts, endothelial cells and VSMCs, respectively. Genome coverage was calculated using the bedtools (version 2.29.0) coverage function. We detected peaks from bedGraph output using the MACS2 bdgpeakcall function (Galaxy Version 2.1.1.20160309.0) on the Galaxy webserver52,53. All peak files were extended 100 base pairs upstream and downstream using the bedtools (version 2.29.0) slop function. We detected overlaps of SCAD potential functional variants with relevant genomic regions using the findOverlap function from the rtracklayer package (version 1.52.1)54. We used the Integrated Genome Browser (version 9.1.8) to visualize read density profiles and peak positions in the context of the human genome55.

Gene prioritization

Genes located within 500 kb of lead variants were annotated to prioritize the most likely causal genes. To find the closest gene(s) from lead SNPs and genes overlapping with variants in the credible set of causal SNPs, gene coordinates were retrieved from Gencode release 38 and aligned to hg19 genomic coordinates (gencode.v38lift37.annotation.gff3.gz). Significant eQTL associations and all SNP–gene eQTL associations in the version 8 release of the GTEx database were retrieved from the GTEx website (www.gtexportal.org/home/datasets). Colocalization of association with SCAD and eQTLs was evaluated using the R coloc package (version 5.1.0) with default values as priors. We considered that there was evidence for colocalization if H4 coefficients were >75% or if eQTL association was significant for SCAD lead SNPs and H4 was over 25%. TWASs were performed using the FUSION R/Python package44. Gene expression models were pre-computed from GTEx data (version 8 release) and were provided by the authors. Only genes with a heritability P < 0.01 were used in the analysis. Both tools used linkage disequilibrium information from the European panel of phase 3 of the 1000 Genomes Project. Bonferroni multiple testing correction was applied using the p.adjust function in R (version 4.1.0). Significant capture Hi-C hits in aorta tissue were provided as supplementary data by Jung et al.25. Genes associated with mouse cardiovascular phenotypes (code MP:0005385) were retrieved from the Mouse Genome Informatics database (www.informatics.jax.org)56. We also queried the DisGeNET database, using the disgenet2r package (version 0.99.2), for genes with reported evidence in human cardiovascular disease (code C14) with a score of >0.2, including “ALL” databases57. In the absence of a missense variant, colocalization and TWAS criteria were given a tenfold weight compared with other criteria. At each locus, we prioritized genes fulfilling the largest number of criteria. In cases where several candidates were retained, we prioritized genes that were most likely to have a function in arterial disease (for example, expression in arterial tissues or exclusion of pseudo-genes).

Druggability of prioritized genes

The druggability of the gene products identified through the GWAS was assessed by reference to the set of genes encoding druggable targets derived by Finan et al.26 using ChEMBL version 17. Targets in this set are subclassified into: (1) the efficacy targets of approved agents and clinical phase drug candidates (tier 1); (2) genes encoding targets with known bioactive drug-like small molecule binding partners and those with substantial sequence with approved drug targets (tier 2); and (3) genes encoding secreted or extracellular proteins, proteins with more distant similarity to approved drug targets and members of key druggable gene families not already included in tiers 1 or 2. Further lookups of approved and clinical phase targets were performed against ChEMBL58 version 30 and the British National Formulary (accessed 9 April 2021). Note that identified drug targets can either be: (1) a single protein providing a 1:1 link with the causal gene nominated in a GWAS and post-GWAS analysis; (2) a protein complex where the causal gene can encode a member of the complex; or (3) a protein family with the causal gene being a member of the family.

Bayesian network query of SCAD candidate genes

Gene expression data from the aorta artery, coronary artery, tibial artery and cultured fibroblasts were curated from version 8 of the GTEx database (ref. 28). Gene expression data from the mouse aorta was curated from the Hybrid Mouse Diversity Panel (HMDP)27. Tissue-specific gene regulatory Bayesian networks were constructed from the GTEx and HMDP gene expression data using RIMBANET29. The Bayesian network from each dataset included only network edges that passed a probability of >30% across 1,000 generated Bayesian networks starting from different random genes. Bayesian networks were combined for the top GWAS hits query, and mouse gene symbols were converted to their human orthologs. Bayesian networks were queried for the identified top GWAS hits to identify their first-degree network connections and to determine connections between their surrounding subnetwork nodes. The directions of edges were informed by prior knowledge, such as eQTLs and previously known regulatory relationships between genes. Subnetworks were annotated by top biological pathways representative of the subnetwork genes using Enrichr with a false discovery rate of <0.05.

Colocalization with other traits and diseases

Summary statistics were retrieved from individual studies, as indicated in Supplementary Table 17. At each locus, we selected variants found in both SCAD and the other studies with a high quality of imputation (r2 > 0.9) and located within 500 kb from the SCAD lead SNP. COL4A1 and COL4A2 loci were separated by placing an equidistant border from SCAD lead SNPs for the inclusion of SNPs in the analysis. Signal colocalization was evaluated using the R coloc package (version 5.1.0) with default values as priors. We reported H4 coefficients indicating the probability of two signals sharing a common causal variant at each locus.

Heritability estimates and genetic correlation

We used linkage disequilibrium score regression21 implemented in the ldsc package (version 1.0.1; https://github.com/bulik/ldsc/) and SumHer22 implemented in the LDAK software (www.ldak.org) to quantify the heritability explained by common variants or SNP-based heritability (h2SNP) for SCAD and the degree of genetic correlation between SCAD and other diseases and traits. We also used SumHer to estimate the SNP-based heritability attributable to loci associated with SCAD at genome-wide statistical significance. Loci were defined as the 1 megabase region around lead SNPs in the GWAS meta-analysis. SNPs belonging to each locus were used as annotations to calculate the partitioned heritability. Two analyses were performed: one that considered separated loci and a second that aggregated all SNPs as one annotation. Summary statistics were acquired from the respective consortia and are detailed in Supplementary Table 17. For each trait, we refined the summary statistics to the subset of HapMap 3 SNPs to reduce the potential bias due to poor imputation quality. Correlation analyses were restricted to European ancestry meta-analyses summary statistics. We used the European linkage disequilibrium score files calculated from the 1000 Genomes reference panel and provided by the developers. P < 1.9 × 10−3, corresponding to adjustment for 26 independent phenotypes, was considered significant. We conditioned SCAD association on cardiometabolic trait genetic association using the mtCOJO tool from the GCTA pipeline31. The resulting summary statistics were then used to compute genetic correlations between SCAD, conditioned on cardiometabolic traits and traits of interest.

Mendelian randomization analyses

We applied a stringent selection process for instrumental variables to ensure the validity of our Mendelian randomization results. To select valid instrumental variables that respect the three key assumptions ((1) strong association with the exposure; (2) independence from potential confounders between the exposure and outcome; and (3) influence on the outcome only through the exposure), we used linkage disequilibrium clumping with a P value threshold of <5 × 10−8 and a linkage disequilibrium r2 < 0.001 within a 10,000 kb window based on the European population in the 1000 Genomes Project. We excluded candidate instrumental variables that were absent in the summary statistics data from a GWAS of our outcome (SCAD/CAD). To minimize the risk of horizontal pleiotropy, we removed candidate instrumental variables that were associated with the outcome or in high to moderate linkage disequilibrium (r2 > 0.6 within a 10,000 kb window).

We used the multiplicative random-effects IVW method59 implemented in the TwoSampleMR R package to estimate the associations of genetically predicted cardiovascular risk factors, including blood pressure (SBP and DBP), lipids (HDL, LDL and triglycerides), BMI, smoking liability and type 2 diabetes, with each of the outcomes of interest (SCAD or CAD). Estimates were scaled to a doubling in genetically predicted smoking risk, or to a one-unit increase in the genetically predicted trait for the continuous traits. We performed sensitivity analyses using the weighted median and MR-Egger methods to assess the consistency of estimates under alternative assumptions about genetic pleiotropy, as recommended59. We also performed Cochran’s Q test to assess the heterogeneity between estimates obtained using different variants. As 11 risk factors were assessed, a Bonferroni-corrected significance level of 0.05/9 = 5.6 × 10−3 was used as the threshold for statistical significance in this analysis. P values between 5.6 × 10−3 and 0.05 were considered suggestively significant.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.