Case/Control Cohort
The raw case/control cohort consisted of 2,885 cases and 4,227 controls, spread across 10 datasets (Fig. 1, Supplemental Table 1). All samples were genotyped on versions or derivatives of the Illumina GSA (Supplemental Table 1). We found a total of 537,278 probes shared across all datasets within this study, and utilized these probes for both quality control and CNV calling.
We isolated a subset of the cohort that were of high technical quality and were suitable for case/control comparisons (see Methods for details). A total of 2,248 cases and 3,608 controls were included in our formal CNV analyses (Fig. 1, Supplemental Table 1). We did not see evidence of stratification of OCD case status across any of the major principal components (Supplemental Fig. 1), though it was clear that Swedish and Norwegian ancestry did separate across these components and should be accounted for (Supplemental Fig. 2). Most of the variance within the data was explained by the first 5 PCs (Supplemental Fig. 3).
CNV calling and filtering
We called and analyzed large CNVs (≥ 15 probes, ≥ 30kb). A large number of these calls were present in sample-level data (1.42 calls per sample, Supplemental Tables 2–4). We retained CNVs outside of genomic loci prone to noisy intensity values, and were found at a frequency < 0.01 in the cohort as well as the gnomAD v2.1 structural variant callset (see Methods). This procedure led to a higher degree of comparability across separate datasets (0.59 calls per sample, Supplemental Tables 2–4).
We compared cases and controls for evidence of systemic differences in raw CNV call count, LRRSD and filtered CNV call count. LRRSD metrics across datasets indicated that the data were of good quality. Looking at ANGI controls, which underwent clustering before we received the data, the mean LRRSD metrics were higher, but still in range of other included datasets (Supplemental Fig. 4). As indicated before, raw CNV call counts had appreciable differences between datasets, while QC-pass CNV call counts were well-harmonized across the data (Supplemental Figs. 5 and 6).
Out of an abundance of caution, we compared the global burden of small CNVs in cases versus controls to determine if there are clusters of calls that pile up in a manner suggestive of batch effect. We specifically noted an elevation of small (30kb-100kb) CNV deletion signal (lambda = 1.21), which was driven exclusively by 19 small CNV calls clustering around 3 loci (see Methods). We excluded these small CNVs from further analyses, eliminating genomic inflation for small deletions (lambda = 0.98, Supplemental Fig. 7). No similar clustering that was suggestive of batch effect was present in CNVs calls over 100kb in size (Supplemental Fig. 8).
Global CNV burden
We found that OCD cases had an excess burden of large (> 30kb) rare CNVs relative to unaffected controls (OR = 1.12, P = 1.77x10-3). More of this excess burden appears to come from deletions (OR = 1.16, P = 8.41x10-3) than from duplications (OR = 1.09, P = 0.06, Supplemental Fig. 9). Leave-one-out analyses showed that these results were not driven by one input dataset (Supplementary Fig. 9), or by one covariate (Supplementary Fig. 10). Every additional basepair of large deletion made a sample more likely to be a case (OR = 1.047 per 100kb, P = 2.31x10-3), along with every additional basepair of duplication (OR = 1.033 per 100kb, P = 1.81x10-3). Consistent with this, OCD cases carried an excess burden of large (> 1MB) CNVs (OR = 2.01, P = 3.35x10-4, Supplementary Fig. 11). Ultrarare CNVs found only once in the case/control cohort conferred greater relative risk for OCD (OR = 1.21, P = 1.30x10-3, Supplementary Fig. 12), consistent with particularly penetrant, risk-conferring CNVs being subject to purifying selection. This singleton CNV signal was driven mainly by deletions (Supplementary Fig. 13).
CNV burden is concentrated in protein-coding regions
OCD cases were more likely to carry CNVs that impact protein-coding regions of the genome (OR = 1.19, P = 3.07x10-4). There was no evidence for a case burden relative to controls for CNVs not overlapping any protein-coding bases (OR = 1.04, P = 0.50). Consistent with the burden of large CNVs in cases, the accumulation of CNV-impacted protein-coding genes increased OCD case risk (OR = 1.07 and P = 1.99x10-3 for each deletion-impacted gene, OR = 1.04 and P = 3.48x10-3 for each duplication-impacted gene).
Case CNV signal was concentrated within genes that are dosage sensitive. Cases carried an excess of CNVs that overlap at least one protein-coding gene that is more likely to be intolerant to loss-of-function (pLI > 0.5, OR = 1.60, P = 6.37x10-8, Fig. 2). There was no difference in burden of CNVs that do not carry at least one of these genes (OR = 1.04, P = 0.48). We also utilized recently described [25] sets of data-derived haplosensitive and triplosensitive genes and found that CNV burden was elevated in both sets (Supplemental Fig. 13).
CNV burden impacted evolutionarily constrained bases
OCD cases had a higher number of evolutionarily constrained bases impacted by CNV deletions than controls (OR = 1.03 per kbp, P = 6.34x10-3). There was no significant case/control difference in the number of constrained bases impacted by duplications (OR = 0.998 per kbp, P = 0.79). We found that CNV deletion burden preferentially loads onto bases with particularly high phyloP scores (Fig. 3), consistent with deletions impacting genomic loci that are intolerant to variation. Repeating this test on CNVs that did not impact a coding base, we did not note any significant case/control difference in constrained bases burdened by CNVs (Supplemental Fig. 14).
Gene-based tests of CNV burden
We failed to identify any test statistics where the level of significance passed the threshold for significance (988 tests, FDR-adjusted P < 0.05). There was no evidence of genomic inflation within deletion test statistics or duplication test statistics (lambda = 1.02 and lambda = 1.01, respectively, Fig. 4A and 4B). In spite of no individual loci being implicated, the overall CNV burden described in OCD cases suggests that a larger cohort size is likely to provide the sufficient power required. In particular, cases were more likely to have CNVs where only one sample overlaps the affected area, and specifically, cases have an elevation of loci impacted by at least two deletions beyond what case/control permutation predicts (Fig. 4C). Summary statistics from these tests have been included (Supplemental Table 6), along with statistics from breakpoint-based tests (Supplemental Table 7).
Burden of neurodevelopmental CNVs
In general, we found that OCD cases carried a higher burden of neurodevelopmental CNVs than controls. Burden of neurodevelopmental CNVs as defined in Kendall et al. [26] increased OCD case risk (OR = 2.49, P = 6.04x10-3), as did burden within specific genes implicated with neurodevelopmental disorders from Fu et al. [27] (n = 664, OR = 2.54, P = 1.91x10-5). Although the deletion contribution to this result was higher, there was a discernible contribution from duplications as well (Fig. 4D).
Overlap with exome sequencing studies of OCD
We found non-random overlap between genes impacted by case-only single-gene CNVs in our study and prior OCD exome study statistics from Table S15 of [6]. We took a set of genes that were impacted by at least one single-gene case CNV and no single-gene control CNV (n = 149 genes). These genes had an elevated count of loss-of-function and damaging missense de novo mutations across 771 trios (observed = 9, expected = 3.94, one-sided poisson P = 0.02) and an elevated count of loss-of-function variants in 476 cases versus 1,761 controls (observed = 26, expected = 17.30, one-sided poisson P = 0.03). We set up a Transmission and De Novo Association (TADA) analysis using the same methods described previously [6] and the summary statistics from Table S15, with NORDiC case/control count statistics added. No genes beyond the already-described CHD8 passed the threshold of Q < 0.3 for being classified as a probable risk gene (Supplemental Table 8), though the gene that comes closest, ZMYM2 (Q = 0.32), has been implicated in neuropsychiatric phenotypes across multiple publications [27–29].
OCD polygenic risk in deleterious CNV carriers
We hypothesized that individuals carrying deleterious (pLI > 0.995, neurodevelopmental as in Kendall et al., or neurodevelopmental as in Fu et al.) CNVs were more likely to have lower neuropsychiatric polygenic risk. This would be consistent with higher-powered studies of other neuropsychiatric conditions [30]. To test this, we utilized polygenic risk scores (PRS) computed from three different GWAS summary statistics: standing height (Pan-UKB, https://pan.ukbb.broadinstitute.org) (N = 360,388, as a negative control), OCD [4] (2,688 cases, 7,037 controls), and a cross-disorder study of psychiatric conditions [31] (162,151 cases, 276,846 controls). We tested for an association between deleterious CNV burden and normalized PRS, using the same covariates as those in global CNV burden analyses, and performing separate tests for deletions and duplications.
Of the six tests we performed (Supplemental Table 9), we identified one significant (p < 0.05) association, between deleterious CNV deletions and cross-psychiatric condition study PRS. In this comparison, deleterious CNV deletion carriers in our case cohort had lower normalized psychiatric PRS than non-carriers (estimate=-0.45, P = 3.35x10-3). While this PRS is not OCD-specific, the summary statistics underlying it do include OCD cases, and given how much larger the sample size is, it likely captures pleiotropic common risk variants that increase risk for multiple psychiatric conditions at once.
Clinical features of carriers of deleterious CNVs
We performed an analysis of clinical features of case carriers of these deleterious CNVs versus non-carrier cases (see Supplemental Table S10 for carrier status per sample). We focused on the Swedish subset of the case cohort (n = 1612) where we had access to detailed clinical information on each participant.
We first explored the association between deleterious deletions and duplications and the presence of key psychiatric comorbidities (ASD, ADHD, TS/chronic tic disorder, schizophrenia, and bipolar disorder) through contingency tables and Chi-Square statistics (or Fisher exact tests, when relevant). We found that 6 (4.1%) of the 147 individuals with comorbid ASD had neurodevelopmental duplications, compared to 6 (0.4%) of 1,465 individuals without comorbid ASD (Chi-square = 24,3, df = 1, P < 0.001). The remaining psychiatric disorders were not significantly associated with neurodevelopmental duplications. No significant associations emerged for neurodevelopmental deletions.
We further explored if the presence of duplications or deletions was associated with treatment outcomes in a sub-cohort of Swedish individuals with complete treatment data (n = 846). We found that individuals with deletions (but not duplications) in specific neurodevelopmental disorder genes improved on average 16% on the YBOCS, whereas individuals without such deletions improved 47% on the YBOCS, a statistically significant difference (independent samples t-test; t=-3.03, df = 854, 2-sided P = 0.02).