Introduction

Narcolepsy with cataplexy (Type 1 narcolepsy) affects 1 in 3000 individuals and is caused by the loss of around 70 000 hypocretin- (hcrt, also known as orexin) producing neurons in the hypothalamus. The disease is strongly associated with HLA-DQB1*06:02 and DQA1*01:02,1 the T-cell receptor alpha locus2 and polymorphisms in other immune-related genes.3 The loss of hcrt-producing cells is hypothesized to have an autoimmune basis.4

One of the genetic associations in narcolepsy is rs2305795, a polymorphism located in the purinergic receptor P2RY11.5, 6 The disease-associated allele of rs2305795 decreases P2RY11 expression and increases the sensitivity of CD8+ T cells to cell death induced by ATP. Interestingly, we also recently found that missense mutation in exon 21 of the DNA methyltransferase 1 (DNMT1) gene causes a rare hereditary form of narcolepsy-associated deafness, cerebellar ataxia and eventually dementia (ADCA-DN).7 As the P2RY11 and DNMT1 genes are located in close proximity (within 18 kb) on chromosome 19, we hypothesized that the two signals could be related at the pathophysiological level. In support of this hypothesis, Kornum et al.5 observed a correlation between P2RY11 and DNMT1 expression in peripheral blood mononuclear cells (PBMCs) of both patients and healthy controls. These findings made us question whether the association with P2RY11 in spontaneous narcolepsy could be secondary to linkage disequilibrium (LD) with DNMT1. In the original study, Kornum et al.5 fine-mapped the locus using seven single-nucleotide polymorphisms (SNPs) and reported an association between DNMT1 and narcolepsy in individuals of European ancestry but not Asians. In Asians, LD pattern pointed towards an association with P2RY11.5

One of the biggest challenges of association studies is to identify which genetic variant is causal within a given locus. To address this challenge, increased sample size, functional studies and transethnic mapping may be beneficial. In transethnic mapping, different LD patterns in the disease-associated locus can be explored to narrow down the associated genetic region. This approach has previously been shown to be very powerful in narcolepsy and other diseases.1, 8, 9 In this study, we reinvestigated SNPs in the P2RY11-DNMT1 region in narcolepsy across three different ethnic backgrounds. Based on genome-wide association study (GWAS) data available in Chinese and Caucasians, we observed that the narcolepsy association signal drops sharply between P2RY11 and DNMT1 in both cohorts. Based on LD patterns, we next selected four SNPs for genotyping in a cohort of African Americans. Following genotyping, we identified a novel SNP in EIF3G, rs3826784, best associated with narcolepsy across all three cohorts. EIF3G is located between P2RY11 and DNMT1. To further explore association between these different genes and narcolepsy, we finally also examined gene expression patterns in the entire region.

Materials and methods

Subjects

Narcolepsy with cataplexy cases were selected based on documented hypocretin deficiency or clearcut cataplexy and HLA-DQB1*06:02.2, 6 The initial European ancestry GWA data set was comprised of 807 cases and 1074 DQB1*06:02-positive controls: 415 cases and 753 controls were recruited from the United States and Canada; 392 cases and 321 controls were recruited from European centers. The Chinese GWA data set was comprised of 1078 cases and 1903 controls. Analysis of the GWA data (549 596 SNPs in Europeans and 603 382 SNPs in Chinese) was performed as described.2, 6 The African-American sample contained 1297 individuals (249 cases, 1048 controls). The controls were recruited from the National Institute of Mental Health (NIMH).

For the expression quantitative trait locus (eQTL) analysis, we used a large case–control study of major depression, for which RNA-sequencing data from whole blood together with genotype data were available for 922 individuals of European ancestry.

Imputation of Chinese and European cohorts

Imputation for the PPAN-DNMT1 region (from rs7250025 to rs2043305) was performed in the European and Chinese cohorts separately, using Beagle v.3.2.110 against reference genotypes from the 1000 genomes integrated data set. Chinese genotypes, originally obtained through the Affymetrix Axiom CHP array, were imputed against the CHB reference population. European ancestry genotypes from the Affymetrix 6.0 array were imputed using four European populations as a reference (CEU, TSI, GBR, IBS). In the paper by Kornum et al.,5 SNPs rs2305795 and rs12460842 were fine typed in many of the Caucasian samples. Further, rs2305795 was also retyped in the Chinese cohort6 and all the results were consistent with imputed results. The imputation quality score for rs3826784 was consistently above 0.8 and the overall association on pooled imputed genotypes is considered valid. Therefore, SNP markers with poor imputation quality score in either Chinese or Europeans (r2<0.8) were excluded from all further analysis.

The data of this study is submitted to the ClinVar Database (http://www.ncbi.nlm.nih.gov/clinvar/; accession numbers: SCV000196702−SCV000196705).

Fine mapping of the chromosome 19 locus, and replication of published SNPs

We used 1297 African-American individuals (249 cases, 1048 controls) for genotyping in the region of association. Based on LD and r2 data from the International HapMap Project (Hapmap.org), we observed that the LD pattern between the SNPs found from the GWAS was different in African Americans.

The selected SNPs were rs1551570: NM_020230.5:c.190−151C>T; rs12460842: NM_002566.4:c.−203A>G; rs2305795: NM_002566.4:c.*638G>A; and rs3826784: NM_003755.3:c.596−260A>G. For genotyping, we used predesigned Taqman SNP genotyping assays (Applied Biosystems, Carlsbad, CA, USA) to validate the observed LD pattern from the International HapMap project in a cohort of African-American narcolepsy patients and controls. Genotyping was performed at Stanford University. SNPs rs1551750, rs12460842 and rs3826784 are all located within introns of P2RY11 and EIF3G, respectively. SNP rs2305795 is located in the 3′-UTR of P2RY11. Genotype results were analyzed (allelic associations, LD calculations and analysis) and standard r2 plots were generated using Haploview 4.2.11

RNA sequencing and genotyping of 922 individuals of European ancestry

The complete procedure for whole blood processing, RNA sequencing and genotyping is described in Battle et al.12 Briefly, whole blood was collected in PAXGene tubes for RNA and in acid-citrate-dextrose tubes for DNA. Tubes were stored at −80 °C. Total RNA was extracted using the GLOBINclear Kit (Invitrogen, Carlsbad, CA, USA). Libraries for RNA sequencing was performed using Illumina TruSeq protocol. DNA was genotyped on the Illumina HumanOmni1-Quad BeadChip (Illumina, San Diego, CA, USA).

Statistical analysis

Genome-wide association analysis of the European and Chinese ancestry has been described previously.2, 6 Statistical analysis of genotyped and imputed genotype data were performed using the PLINK software package (v.1.07) (http://pngu.mgh.harvard.edu/purcell/plink/),13 and in Haploview 4.2.11

Associations within ethnic groups were performed with basic allelic χ2 statistic tests. For associations combining multiple ethnic groups, the Mantel–Haenszel test was used together with the Breslow Day test of homogeneity of the odds ratio (both implemented in Plink).

The details of the cis-eQTL analysis are described in Battle et al.12 Briefly, expression data was first normalized to remove the effect of technical factors, population structure and ‘hidden’ factors (which have been shown to reduce the statistical power to identify cis-eQTLs14). Such normalization approaches can significantly increase power for detecting cis-eQTLs; however, in doing so, such approaches also remove broader expression patterns that may be driven by trans-regulation (as a trans-acting factor may affect the expression levels of multiple target genes) or reflect coregulation. As a result, previous work has shown that separate normalization of gene expression data may be appropriate for detecting cis versus trans associations.12, 14 Therefore, we quantified coexpression levels between genes using the normalization approach optimized for detecting trans-eQTLs. For each SNP, a P-value for association with gene expression level of each nearby gene (1 Mb from the transcription start site) was computed based on the Spearman's correlation coefficient. Significant associations were reported using 0.05 FDR (false discovery rate).

PPAN: NM_020230.4; P2RY11: NM_002566.4; EIF3G: NM_003755.3; DNMT1: NM_001130823.1.

Rs1551570: NM_020230.5:c.190−151C>T; rs12460842: NM_002566.4:c.−203A>G; rs2305795: NM_002566.4:c.*638G>A; rs3826784: NM_003755.3:c.596−260A>G.

SNPs reported in Table 1: rs1551570: NM_020230.5:c.190−151C>T; rs12460842: NM_002566.4:c.−203A>G; rs2010353: NM_002566.4:c.19+142G>T; rs3826784: NM_003755.3:c.596−260T>C; rs2305795: NM_002566.4:c.*638G>A; rs12462506: NM_002566.4:c.20−178T>G; rs3826785: NM_003755.3:c.595+327G>A; rs73011220: NM_002566.4:c.19+880G>A; rs112647895: NM_002566.4:c.19+446G>A; rs3745601: NM_002566.4:c.259G>A; rs2305789: NM_020230.5:c.291+50A>G; rs6511570: NM_001130823.1:c.2895−855C>T; rs2288349: NM_001130823.1:c.2721−45C>T; rs35693490: NM_001130823.1:c.2721−959G>A; rs8101626: NM_001130823.1:c.4773+383C>T; rs4804122: NC_000019.10:g.10131268C>T; 9 rs2114724: NM_001130823.1:c.1832+14A>G; rs2290684: NM_001130823.1:c.3394+34T>C; rs8112801: NM_001130823.1:c.3117−203A>C; rs11880553: NM_001130823.1:c.3117−660G>A; rs11880388: NM_001130823.1:c.3117−677C>T; rs11085587: NC_000019.10:g.10124423G>C; rs35374357: NC_000019.10:g.10129267T>C; rs7710: NM_003755.3:c.846T>C; rs870612: NC_000019.10:g.10120814A>C; rs4555265: NC_000019.10:g.10121508G>A; rs11667630: NM_003755.3:c.300+47G>T; rs2290687: NM_003755.3:c.240+23G>A; rs55752217: NM_020230.5:c.292−46C>T; rs2305792: NM_020230.5:c.902−10C>G; rs1037686: NM_002566.4:c.19+419T>A; rs3745600: NM_002566.4:c.237C>T; rs34484805: NM_002566.4:c.20−882A>G; rs7401: NM_002566.4:c.*363A>G; rs11666402: NM_020230.5:c.513+266T>C; rs10414661: NC_000019.10:g.10131561T>A; rs12462004: NC_000019.10:g.10127588T>G; rs10404209: NC_000019.10:g.10126751T>C; rs9305012: NM_001130823.1:c.2266−225A>G; rs10407514: NM_001130823.1:c.2895−449G>C; rs10418707: NM_001130823.1:c.2894+810C>T; rs12611113: NM_001130823.1:c.2721−212A>G; rs60565702: NM_001130823.1:c.2720+999C>T; rs72620548: NM_001130823.1:c.4490−306C>T; rs10854076: NM_001130823.1:c.4293+238C>G; rs4804489: NM_001130823.1:c.2265+418T>C; rs57366074: NM_001130823.1:c.2586+99G>T; rs10414537: NC_000019.10:g.10126048C>T.

Table 1 SNP markers located in the PPAN-P2RY11-EIF3G-DNMT1 locus and their association with Type 1 narcolepsy

Results

Association signal within the risk locus

Using two GWA data sets from narcoleptic patients and controls of European and Chinese ancestry, we investigated polymorphisms in the P2RY11-DNMT1 region. The cohorts included 1881 individuals of European ancestry (807 cases and 1074 controls) and 2981 Chinese (1078 cases and 1903 controls). SNPs in the region of interest were imputed in each GWA data set, and ranked by statistical significance in the Chinese cohort (Table 1). The most significant SNPs in both GWA data sets are in high LD with rs2305795 and located in the PPAN-P2RY11-EIF3G haplotype block, and not in DNMT1 (Table 1). R2 plots clearly illustrate that SNPs in high LD with rs2305795 are restricted to the PPAN-P2RY11-EIF3G region, but not DNMT1 (Figure 1), as was also observed by Han et al.6 Based on this analysis, five SNP markers (rs1551570 (PPAN), rs12460842 (P2RY11), rs2010353 (P2RY11), rs2305795 (P2RY11) and rs3826784 (EIF3G) were significantly associated with narcolepsy in both cohorts, and were selected for further analysis. Owing to a failure of the genotyping assay, however, rs2010353 was excluded, leaving four SNPs for all subsequent analyses.

Figure 1
figure 1

Risk locus on Chr. 19q13.2. Gene organization and LD in the region of interest (10 209 865–10 276 167) compared between European (CEU) and Chinese ancestry (CHB). Top: R2-based LD plots using data from GWA data set of European ancestry (CEU). Bottom: R2-based LD plots using data from GWA data set of Chinese ancestry (CHB). Darker colors illustrate higher levels of LD. Dashed and dotted line indicates the position of rs2305795 and rs3826784, respectively.

Based on data from the International HapMap Project, we next explored LD patterns for the four markers across ethnic groups. Interestingly, two SNPs rs12460842 (P2RY11) and rs3826784 (EIF3G) had a lower LD with rs1551570 (PPAN) and rs2305795 (P2RY11) in an African-American cohort (Figure 2), suggesting these to be candidates for transethnic mapping. Genotyping of these four markers in 1297 African Americans (249 cases and 1048 controls) showed that surprisingly rs3826784, and not the previously reported rs2305795, showed the strongest narcolepsy association based on odds ratio (OR) values, although none of the four SNPs reached statistical significance in this data set. A meta-analysis was performed across all ethnic groups and here we replicated the finding by Kornum et al.5 of an increased OR with the major allele rs2305795A (Table 2 shows data on minor allele). The meta-analysis further revealed that rs3826784 showed the strongest association with narcolepsy across all ethnic groups (Table 2) with a slightly higher OR for the rs3826784G risk allele and a lower P-value. There were no significant differences between the ORs from the four SNPs.

Figure 2
figure 2

R2-based LD plots for individuals of CEU, CHB and ASW. Utah residents with Northern and Western European ancestry from the CEPH collection (CEU), Han Chinese in Beijing, China (CHB) and African ancestry in Southwest USA (ASW). Curly braces indicate LD values between SNP candidates: rs1551570, rs12360842, rs2305795 and rs3826784. Darker color illustrates a higher level of LD. Chromosome 19q13.2 (10 209 865–10 230 599). See Figure 1 for entire locus.

Table 2 Association of SNP markers with Type 1 narcolepsy

Expression quantitative trait loci analysis

To determine if rs3826784 regulates gene expression, we next performed an eQTL analysis of the genes in the risk locus. For this analysis, we used a large case–control study of major depression, for which RNA-sequencing data from whole blood together with genotype data were available for 922 individuals of European ancestry.12 After appropriate normalization, whether or not cases or controls were used it did not influence the eQTL analysis;12 thus, we performed all analysis in the combined sample.

Using this data set, we found that the expression of EIF3G increased with the risk allele, DNMT1 expression decreased and the expression of PPAN and P2RY11 did not change with the risk allele (rs3826784G) (Figure 3). Furthermore, the expression of EIF3G and DNMT1, but not PPAN and P2RY11, correlated with rs3826784, the strongest association being with EIF3G (Table 3). The other four SNPs in our analysis showed equally strong association with EIF3G consistent with the high LD between these four SNPs in Caucasians (Figure 2). The expression of DNMT1 was most strongly correlated with eQTL SNP for DNMT1 rs2290684 (highlighted in Table 1). The effect of rs3826784 on DNMT1 expression is therefore likely explained by LD between rs3826784 and rs2290684 (r2=0.35). In line with this, conditional analysis showed that rs3826784 does not independently affect DNMT1 expression when the effect of rs2290684 is controlled for (P>0.1).

Figure 3
figure 3

Gene expression in whole blood with various rs3826784 genotypes. (a and b) The expression of PPAN and P2RY11 did not change with the risk allele (rs3826784G). (c) The expression of EIF3G increased with the risk allele (rs3826784GG). (d) The expression of DNMT1 decreased with the risk allele (rs3826784GG). The data are reported as normalized effect size, which equals the correlation coefficient (Spearman's ρ correlation coefficient is used as a measure of change in expression level on the y axis).

Table 3 Association between SNPs and eQTLs in the Type 1 narcolepsy risk locus (negative log10 P-values for association between genotype and expression levels)

The effect of rs2305795 on P2RY11 was also re-examined. Using our data, we found that rs2305795 was not an eQTL for P2RY11 (see Supplementary Table S1). In addition, as rs2305795 is located in the 3′-UTR of P2RY11, it was possible to analyze allele-specific association for this SNP. Using a test of association between rs2305795 and the allelic ratio of P2RY11 (as an eQTL), we did not observe a significant association between rs2305795 and the allelic expression of P2RY11 in the human whole blood data set in contrast to earlier reports in PBMCs.5

Gene expression correlation

We finally investigated the possibility of coregulation and/or regulatory relationships between the four genes in our locus by analyzing correlations between expression of PPAN, P2RY11, EIF3G and DNMT1. PPAN and P2RY11 were found to be expressed at very low levels overall in whole blood cells, which correlate with the expression in the brain (Supplementary Tables S2+S3). Expression of EIF3G correlated with the expression of PPAN in normalized data optimized for detecting trans-eQTLs but not in data optimized for detecting cis-eQTLs (for more information see Materials and methods). Finally, the expression level of EIF3G was not correlated with DNMT1 (Table 4). We also did not observe a significant correlation between P2RY11 and DNMT1 in the normalized data for detecting cis-eQTL or trans-eQTL in contrast to what has earlier been reported for PBMCs.5

Table 4 Correlation coefficient between gene expression levels of each gene pair was measured using the Spearman's correlation coefficient

Discussion

In this study, we found rs3826784, a SNP located within an intron of the EIF3G gene, to be the most significantly associated SNP of the region. Transethnic mapping further revealed that rs3826784 could be a better marker for the genetic association with narcolepsy than the previously identified rs2305795. However, the magnitude of the association is very similar to that of the previously reported SNP rs2305795.5 The new SNP, rs3826784, is in high LD with SNPs located in PPAN and P2RY11, but not in DNMT1. Genetic LD between P2RY11 and DNMT1 is weak in narcolepsy and control cohorts, confirming the observation by Han et al.6 Importantly, however, elements located within the P2RY11 gene may still regulate DNMT1 expression. We also found that the expression of EIF3G increased with the risk allele, DNMT1 expression decreased and there was no effect of rs3826784G on PPAN or P2RY11 mRNA levels in whole blood. The results suggest that the narcolepsy-associated SNP in this locus affects EIF3G expression rather than P2RY11.

To gain insight into gene regulation in the entire risk locus, we investigated gene expression correlations. In whole blood, expression of EIF3G strongly correlated with the expression of PPAN and P2RY11, and less with DNMT1. These results were consistent with strong LD found with SNPs located in the EIF3G, P2RY11 and PPAN region, an effect not extending to DNMT1. Surprisingly, however, expression of P2RY11 and DNMT1 did not correlate as earlier discovered in PBMCs.5 PPAN and P2RY11 were found to be expressed at very low levels in whole blood cells. The discrepancy in P2RY11 expression is most likely caused by the different input material in the two studies. Kornum et al.5 showed that P2RY11 expression was predominantly seen in CD8+ cells. These account for ~20% of PBMCs. However, whole blood also contains granulocytes, which accounts for 65% of all cells, and may thus mask the expression of P2RY11 in the CD8+ cells. In agreement with our results, it has previously been reported that EIF3G is highly expressed in blood cells.15 These findings illustrate the complexity of interpreting eQTL studies in whole blood, considering heterogeneity of the effect of regulatory regions on gene expression in various white blood cell subsets. Indeed, cell-specific regulatory mechanisms in the PPAN-P2RY11-EIF3G region are likely present. Future studies will need to address the regulation of the PPAN-P2RY11-EIF3G narcolepsy risk locus within distinct sub-populations of white blood cells. That cell-specific effects are present is not surprising considering the importance of the regulation of apoptosis in various immune cells types in reaction to changes in the environmental milieu.

How could EIF3G be involved in narcolepsy? Eukaryotic initiation factors (EIF) have an important role in the initiation phase of eukaryotic translation by forming a complex with the 40S ribosomal subunit and recruiting a Met-tRNAi called the 43S to the preinitiation complex. This complex recognizes and binds the 5′ cap structure of mRNA, promoting ribosomal scanning of mRNA and regulating recognition of the AUG initiation codon.16, 17 In mammals, EIF3 is the largest scaffolding initiation factor (750 kDa), comprising 13 subunits (A–M). These subunits cluster together in subcomplexes, in which EIF3G connects to EIF3B and EIF3I. EIF3B:G contains RNA recognition motif subunits that are involved in RNA binding. Interestingly, EIF3B:G:I also interacts directly with certain viral mRNAs to promote the translation of viral proteins.18 EIF4G, which interacts with EIF3,19 contains a binding domain for the non-structural influenza protein, NS1,20 and is recruited to the viral mRNA to initiate translation of influenza virus. This is particularly interesting as recent findings have implicated a role of the H1N1 pandemic virus of 2009 as a trigger for narcolepsy development.6 It has been suggested that the disruption of EIF pathways in childhood H1N1/09 influenza increases disease severity,21 so polymorphisms in this locus could indirectly change immune responses to influenza and narcolepsy risk.

Another possibility could involve regulation of cell death, a function that could be shared with P2RY11 and DNMT1, with coordinated regulation of these loci explaining the conservation of synteny. The N terminus of EIF3G indeed interacts with the C-terminal region of apoptosis-inducing factor (AIF). AIF is a ubiquitous FAD-binding flavoprotein that has an important role in caspase-independent apoptosis. Mature AIF inhibits newly translated protein synthesis via its interactions with EIF3G. Additionally, mature AIF overexpression specifically activates caspase-7, thereby amplifying the inhibition of protein synthesis including EIF3G cleavage.22, 23 Additionally, EIF3G is expressed in the brain and the lateral hypothalamus (Allen Brain Atlas: www.brain-map.org).15 Higher EIF3G expression with the narcolepsy risk allele could cause a dysregulated cell death, which again could have an important role in white blood cells in the context of autoimmunity, or in the brain in the context of hypocretin cell loss.

In conclusion, using transethnic mapping, we found a novel genetic association of Type 1 narcolepsy with rs3826784, an SNP located in EIF3G. The new SNP is in high LD with the previously reported P2YR11 rs2305795 SNP in individuals of European and Chinese ancestry, but displays lower LD in African Americans. As the association signal from rs3826784 is nominally better compared with rs2305795 in all three cohorts, rs3826784 may be a better marker for the genetic association with narcolepsy, although this conclusion is tentative considering sample size of the African-American Cohort. Further confirmation is needed through extended GWA studies that combine multiple ethnic groups, and local resequencing. A complex correlation between the expression of three genes in the region suggests that a shared regulatory mechanism exists and might be affected by the polymorphism.