Introduction

Hirschsprung’s disease (HSCR; aganglionic megacolon) is a congenital disorder characterised by the absence of enteric ganglia along a variable length of hindgut. There is significant ethnic variation in the incidence of the disease, and it is most often found among Asians (2.8 per 10,000 live births; Amiel and Lyonnet 2001; Torfs 1998). Non-familial HSCR has a complex pattern of inheritance and manifests with low, sex-dependent penetrance and variability in the length of the aganglionic segment, according to which patients are classified into short segment (S-HSCR; 80%), long segment (L-HSCR; 15%), and total colonic aganglionosis (TCA 5%). The male:female ratio (M:F) is ≈4:1 among S-HSCR patients and ≈1:1 among L-HSCR patients. HSCR presents mostly sporadically although it can be familial (5–20% of cases), where the recurrence risks to sibs vary from 1.5 to 33% depending on the gender and the length of the aganglionic segment in the proband, and the gender of the sibling (Badner et al. 1990).

The RET gene, encoding a tyrosine-kinase receptor, is the major HSCR causing gene (Edery et al. 1994; Romeo et al. 1994) and its expression is crucial for the development of the enteric ganglia. Mutations in the coding sequence (CDS) of RET account for up to 50% of the familial cases and between 15 and 20% of the sporadic cases indicating that additional HSCR-causing mutations exist (Hofstra et al. 2000). Other HSCR genes identified so far mainly code for protein members of interrelated signalling pathways involved in the development of enteric ganglia: RET, endothelin receptor B (EDNRB), and the transcriptional regulator SOX10. Yet, mutations in genes other than RET (GDNF, GFRA1, NRTN, PHOX2B, NKX2.1, SOX10, NRG1, EDNRB, EDN3, ECE-1, KIAA1279, ZFXH1B, NTRK3, and L1CAM) account for only 7% (Amiel et al. 2003; Angrist et al. 1996; Brooks et al. 2005; Fernandez et al. 2009; Hofstra et al. 1996, 1999; Pingault et al. 1998; Puffenberger et al. 1994; Wakamatsu et al. 2001). Failure to identify RET CDS mutations in some of the RET-linked families suggests that noncoding RET mutations, mutations in regulatory regions of RET could also contribute significantly to the disease. Indeed, common RET single nucleotide polymorphisms (SNPs) are strongly associated with HSCR, with the largest contribution to risk made by a functional SNP (rs2435357) lying in an enhancer-like sequence in intron 1 (Burzynski et al. 2004, 2005; Fernandez et al. 2005; Fitze et al. 2002, 2003a, b; Garcia-Barcelo et al. 2003, 2005; Sancandi et al. 2003). We have recently reported the genetic interaction of RET intron 1 SNP (rs2435357) with HSCR-associated NRG1 SNPs, whereby the risk to HSCR is increased by 2.3-fold for the RET risk genotype (Garcia-Barcelo et al. 2009).

Reduced penetrance of RET and other HSCR gene mutations and variable expression of the HSCR phenotype indicate that the disease may result from the combined effect of several genes, whereby the outcome would be altered RET expression (Cantrell et al. 2004; Carrasquillo et al. 2002; Lang et al. 2000; Lang and Epstein 2003; McCallion et al. 2003).

RET modifiers have been mapped to the following chromosomal regions: (i) 3p21; (ii) 19q12 (Gabriel et al. 2002); (iii) 4q31.3-q32.3 (Brooks et al. 2006), and (iv) 9q31. Despite these important findings, the genes in these loci are yet to be identified. The 9q31 locus segregated in families of European descent harbouring no or hypomorphic RET mutations, yet the families still showed linkage to 10q12 (RET locus) (Bolk et al. 2000b).

To identify the HSCR locus in the 9q31 candidate region, we have conducted dense genotyping of the region of a Dutch patient cohort and evaluated the same region in Chinese.

Materials and methods

The overall study was approved by the institutional review board of The University of Hong Kong together with the Hospital Authority (IRB: UW 07-292).

Dutch (Groningen) family-based association study

The Dutch patient cohort consisted of 140 HSCR trios that were genotyped using Illumina GoldenGate platform for 370 SNPs spanning the 7 Mb (from 108.5 to 115.5 Mb) of the 9q31 HSCR-associated region as described by Bolk et al. Most of these 370 SNPs were tag SNPs. Quality control based on identity-by-descent (IBD) revealed 3 duplicated trios. SNPs with missing genotypes (>5%) and with minor allele frequency (MAF) <5%, and/or violating Hardy–Weinberg equilibrium (p < 0.001) were excluded, leaving a total of 301 SNPs for association study. PLINK (Purcell et al. 2007) was then used to conduct the TDT analysis in the resulting 137 Dutch trios. As much as 16 HSCR probands carried RET CDS mutations.

Chinese case–control analysis

Subjects

For analysis of the 9q31 region in the Chinese sample, we revised the genotype data of the candidate 9q31.3 region obtained from a previous genome-wide association study (GWAS) conducted on a total of 181 HSCR Chinese patients and 386 healthy individuals (discovery group) using Affymetrix GeneChip®Human Mapping 500K Array as previously described (Garcia-Barcelo et al. 2009). To that data, we included 92 additional controls recently genotyped (see below).

For the replication of the association found when patients were stratified according to the RET mutation status, we included 21 independent HSCR cases bearing RET CDS mutations and 71 control individuals from Mainland China (replication group).

Characteristics of the patients are depicted in supplementary Tables 1 and 2.

Revision of the Affymetrix 500K data

Following the recent recommendations on quality control standards (Sklar et al. 2008), we imposed a more stringent filtering criterion based on heterozygosity to the GWAS described above. Heterozygosity screening aims at the identification of individuals who have more heterozygous genotype calls than the average of the sample set, since the excess of heterozygosity is likely due to cross-contamination of samples. Hence, outliers should be excluded from the analysis. Prior to evaluating the heterozygosity of the samples, we pruned the SNP dataset to avoid bias that could have been introduced by large quantities of SNPs in strong LD. In brief, we pruned the dataset (by removing 1 of a pair of SNPs with r 2 > 0.25 within a window of 200 SNPs and shifts of 50 SNPs forward at each step) using PLINK. In total, 56,000 SNPs in weak LD with each other remained. The heterozygosity screening excluded outliers with more heterozygous calls than the average (estimated inbreeding coefficient lower than 3 standard deviations from the average).

After this filtering criteria, a total of 173 HSCR cases (31 patients bearing RET CDS mutations) and 436 controls were included in the analysis. Restricting by the region analysed by the Dutch group, 988 SNPs falling within the 9q31 7 Mb candidate region (from 108.5 to 115.5 Mb) were selected. The mean call rate was 99.09%. After filtering by MAF, genotyping rate and Hardy–Weinberg equilibrium, 981 SNPs were left for analysis.

Correction for population structure and association analysis

We used the EIGENSOFT package on the whole genome data pruned for LD as described above to detect population substructure. The Tracy–Widom test (implemented in EIGENSOFT) nominated only the first two axes of variation as significant (p < 0.05). The first axis of variation corresponded to the ancestral differences between Northern and Southern Chinese as seen in supplementary Fig. 1. To correct for these ancestral differences, we applied EIGENSTRAT (Price et al. 2006) (also implemented in EIGENSOFT) on the candidate regions based on the two axes of variation nominated by the Tracy–Widom test. The Cochran-Armitage trend test was then used to assess the levels of association of the SNPs with HSCR.

Replication stage

EIGENSTRAT could not be applied to correct the replication sample for population stratification as no genealogy information can be inferred when only two SNPs are genotyped. Logistic regression was carried out instead to assess the association in the combined sample (discovery + replication samples). We included a covariate for the sample ancestral origin (northern and southern Chinese ancestry) and a covariate to distinguish GWAS and replication samples, thereby controlling for any possible allele frequency differences between known populations and between stages of association analysis. We also tested the interaction terms between each covariate and the genotype, which were not found to be significant (p > 0.5 for both, indicating that the effect is not significantly different between subpopulations or stages). Thus, only the two covariates were included in the final model for combined analysis.

Results

Family-based association test on Dutch trios identifies two associated regions on 9q31

Through TDT analysis of 301 SNPs on 137 Dutch trios, we identified 17 SNPs (supplementary Table 3) (p < 0.05) showing asymmetrical transmission from parents. Of these, 10 SNPs could be grouped into 2 moderately associated peaks of 5 SNPs each, 9q31A (110.7–111.1 Mb) and B (112.1–112.7 Mb).

The 9q31A region encompasses a gene-rich spot which contains 4 genes IKBKAP, C9orf6, CTNNAL1, and C9orf5 within a single LD block that spans about 100 kb. Among the 5 associated SNPs, the strongest evidence for association with HSCR was found for rs12351693 (p = 0.023) and rs10979637 (p = 0.023) located within the intronic region of CTNNAL1 (Table 1; Fig. 1a).

Table 1 9q31A and B SNPs associated with HSCR in Dutch trios samples
Fig. 1
figure 1

Association analysis on 9q31A (110,500–111,000 kb) (left panel) and 9q31B (112,000–112,500 kb) (right panel). a TDT analysis on 137 Dutch HSCR trios and b Case–control study on Chinese using 500 k. Associated SNPs listed in Tables 1 and 2 are represented as blue diamonds. Recombination rate and genes in the region are also shown (green “+” strand; red “−” strand)

CTNNAL1 encodes alpha-catulin, a protein that modulates the Rho signalling pathway by providing a scaffold for the ARHGEF1 (Park et al. 2002). The expression of its mouse homologue, Ctnnal1, was found to be severely reduced in intestines of Ret deficient mice (Retk/k) when compared to normal mice (Ret+/+) and consequently, the gene has been considered a putative RET modifier (Heanue and Pachnis 2006).

However, the strongest associations lied in 9q31B and mapped upstream the TXNDC8 gene (p = 4.8 × 10−3 for rs7038415), and in the intronic region of SVEP1 (p = 2.9 × 10−3 for rs10816998) (Table 1). Other associated SNPs within the 9q31B region, mapped to the MUSK gene. MUSK is required for neuromuscular junction formation in vivo and for the survival and development of discrete neuronal subpopulations (Kiba et al. 2009). The associations described above were confirmed by performing 1 million permutations (Table 1; Fig. 1).

SVEP1 is associated with HSCR in Dutch patients without RET mutations

As it was reported that the 9q31 HSCR susceptibility locus segregated exclusively in RET-linked families depleted of RET CDS mutations (Bolk et al. 2000b), we stratified the cases according to their RET CDS mutation status. We defined RET CDS mutations as DNA changes in the coding region or exon/intron boundaries of RET that lead to frame-shift, non-sense or missense mutations in the RET protein and that have not been found in at least 400 control chromosomes. PolyPhen (Sunyaev et al. 2001) was used to predict the effect of the missense mutations on the RET protein. Truncating mutations were considered highly deleterious.

The 301 SNPs in the 9q31 region were checked again for association. The rs10816998 and rs7038415 SNPs within SVEP1 were strongly associated with HSCR in families without RET mutations. After stratification of the patients, the OR attained a significance level of p = 5.33 × 10−5 with OR = 2.379 (95% CI:1.542, 3.671) for rs10816998 and p = 7.50 × 10−5 with OR = 2.345 (95% CI:1.518, 3.622) for rs7038415. These association values survived the conservative Bonferroni correction for multiple testing (≈300 SNPs). This finding in the Dutch population is in line with the reported linkage of 9q31 with HSCR in patients of European descent bearing no or hypomorphic RET mutations (Bolk et al. 2000a).

The SVEP1 SNPs, rs10816998 and rs7038415, were then genotyped in 107 independent Dutch patients without RET CDS mutations and 183 controls. Unfortunately, the p value of the replication did not reach statistical significance.

Revision of the population-based GWAS on Chinese identifies IKBKAP as a HSCR candidate gene

We used the GWAS data (above described) to investigate the association observed in the Dutch. After EIGENSTRAT correction for stratification (see Sect. Materials and methods and supplementary material), genotypes of 981 SNPs spanning the entire 9q31 region were tested for association on 173 HSCR patients (31 with RET CDS mutations) and 436 controls.

In the Chinese dataset, 6 SNPs within the 9q31A region showed association with HSCR. No association was found in 9q31B. The loci displaying the strongest statistical evidence for association were rs10979596 and rs10979597 (in intron 25 of the IKBKAP gene) which are in complete LD with each other (r 2 = 1) (Fig. 1b; Table 2). Four additional SNPs within IKBKAP also showed marginally significant association (p < 0.03) with HSCR. These include the IKBKAP I816L missense polymorphism (rs2230793, p = 0.024) and 3 intronic SNPs (Table 2). Mutations in IKBKAP (inhibitor of kappa light polypeptide gene enhancer in B cells kinase complex-associated protein, involved in transcription elongation) are associated with a neurodevelopmental disease, the Riley-Day syndrome, also known as familial dysautonomia (FD) (Anderson et al. 2001; Slaugenhaupt et al. 2001). FD is an autosomal recessive disorder that occurs almost exclusively in persons of Ashkenazi Jewish descent. This neuropathy results from depletion of subsets of autonomic and sensory neurons (Slaugenhaupt and Gusella 2002). FD patients are born with fewer neurons in their dorsal root and sympathetic ganglia than their peers, and over time, neurons become more scarce (Mezey et al. 2003). Most importantly, some patients with FD also suffer from gastrointestinal dysfunction shortly after birth and interestingly, the co-occurrence of both FD and HSCR has been reported (Azizi et al. 1984). Interestingly, the region encompassing our 6 IKBKAP associated SNPs includes the region where FD-causing mutations lie. Ikbkap is expressed in the rat developing gut and IKBKAP in the human colon, supporting a biological role for this gene in the development of enteric nervous system (Bar-Shai et al. 2004; Mezey et al. 2003).

Table 2 9q31A SNPs associated with HSCR in Chinese

The association of IKBKAP is stronger in Chinese HSCR patients with RET mutations

The Chinese HSCR patients were also stratified according to the RET CDS mutation status. We then tested again the 981 SNPs in the 9q31 region for association by independently comparing RET CDS mutation carriers (31 Chinese HSCR patients; discovery group) against 436 controls as well as non-carriers (142 HSCR patients) versus the same control set.

As much as 12 SNPs were found associated with HSCR in RET mutation carriers (p < 0.01). Importantly, 6 of these SNPs mapped to the IKBKAP loci in the 9q31A and had been found associated with HSCR before the stratification of the patients according to the mutation status (Fig. 2). This data suggested that the initial association detected was mainly driven by this subset of RET mutation carriers and that the original marginal association had been masked prior stratification because of the small number of carriers. Among the 12 SNPs segregating with RET mutations, 4 IKBKAP SNPs survived the conservative Bonferroni correction for multiple testing (≈1,000 SNPs) (Table 3). The frequency of the risk SNP alleles tended to increase with the severity of the RET mutations, yet the small sample does not allow to draw any conclusion (Table 3).

Fig. 2
figure 2

Association analysis on RET Chinese CDS mutation carriers (left) and Chinese non-carriers (right). Blue diamonds represent associated SNPs before stratification by mutation status. Dark blue diamonds indicate the 4 SNPs that survived the Bonferroni correction multiple for testing (only 3 dark blue diamonds can be observed as there is overlapping between two SNPs)

Table 3 IKBKAP SNPs associated (survived Bonferroni correction for multiple testing) with HSCR patients with RET CDS mutations

To validate the association of the IKBKAP SNPs with HSCR patients carrying RET CDS mutations, we genotyped (by PCR followed by direct sequencing) the 2 perfectly linked SNPs (rs10979596 and rs10979597) in an independent set of 21 HSCR patients (replication group) bearing RET mutations and 71 controls collected from mainland China. These SNPs were chosen for replication because they fall in the same amplicon and on the basis of genetic homogeneity between Northern and Southern Chinese. We successfully detected the association of rs10979596 and rs10979597 in the replication sample, with logistic regression association value of p = 9.45 × 10−3 and OR = 3.38 (95% CI: 1.35, 8.49) (Table 4). Logistic regression was also used for the combined analysis of HSCR samples bearing RET mutations (discovery—31 patients and replication—21 patients). This yielded a highly significant association of p = 2.71 × 10−6 and OR = 3.15 (95% CI: 1.95, 5.09) for the heterozygous genotype under an additive model. Considering the possible population stratification, we employed the same logistic regression approach conditioning on the strata defined by sample origins (Northern and Southern Chinese), obtaining a significant level of association of p = 5.75 × 10−6 with OR = 3.29 (95% CI: 1.97, 5.51) (Table 4). Results of the latter approach were very similar, indicating that samples collected inside and outside Hong Kong (discovery and replication phases) were homogeneous, with little stratification.

Table 4 Summary of the statistics for the IKBKAP rs10979596 and rs10979597 markers in RET mutation carriers

Discussion

This study aims at the identification of a HSCR susceptibility locus in the 9q31 region. As the association of this region was initially reported in families of European descent, we first explored the region in a Dutch cohort. As the evaluation of an association in two populations of different ethnic origins would definitively increase confidence in the finding, we also explored the 9q31 region in the Chinese. We reasoned that this strategy would allow us to determine (i) whether the 9q31 locus exerted an effect in patients of European descent other than those in whom the locus was initially identified, (ii) assess whether the association of the 9q31 locus with HSCR is population specific, and if not (iii) use population differences in LD to fine-map the gene of interest.

Our data show an initial strong association of SVEP1 SNPs with Dutch HSCR patients bearing no RET CDS mutations. This finding ratifies the reported linkage to the 9q31 region in HSCR families with no RET CDS mutations. The case–control replication of the SVEP1 association in the Dutch did not reach statistical significance indicating that the initial association was spurious or that there is a population stratification issue in the samples used for replication. This point would need further exploration in a much larger sample. The role of SVEP1 is not well characterised. Study of its possible implications in the development of the nervous system should follow.

Intriguingly, the analysis of the region in the Chinese population demonstrated the existence of a HSCR-associated locus (IKBKAP) that, although within 9q31, is not correlated at all with the SVEP1 locus identified in the Dutch. The association of IKBKAP SNPs with HSCR in Chinese became much more prominent in those patients with RET CDS mutations which is at odds with the initial findings described by Bolk and our own findings in the Dutch. Importantly, the IKBKAP association with HSCR bearing RET CDS mutations was replicated in the same Chinese population. No IKBKAP association was detected in the Dutch (supplementary Table 4).

We investigated whether the lack of cross-population replication of the associations of SVEP1 and IKBKAP SNPs with HSCR could be due to population LD and allele frequencies differences, in which case, some true associations may not be replicated, regardless of the sample size of the study. Close examination of the SVEP1 and IKBKAP regions showed no major LD differences between CEU and CHB HapMap populations indicating that lack of replication across populations was not due to differences in LD structure (supplementary Figs. 2 and 3). Lack of replication could also be attributed to a false positive result on the Chinese sample although we think this is unlikely since the initial association found in Chinese was replicated in an independent sample.

In view of these results, we would conclude that the 9q31 locus that appears associated with HSCR in families bearing no RET CDS mutations is indeed population specific, as the findings in Chinese point to a different locus with a completely different genetic behaviour than that identified in the Dutch. As 9q31 locus was reported to segregate exclusively with RET-linked families of Caucasian origin whose members had weak or hypomorphic RET CDS mutations, the investigation of SVEP1 rare variants in these types of patients is warranted.

The association of IKBKAP in Chinese implies that RET mutation carriers may have an additional risk. As yet, no genotype–phenotype correlation could be found which, after all, is in line with the genetic complexity inherent to HSCR. Had parental DNA been available, we might have been able to provide an argument for the role of IKBKAP SNPs on the penetrance of inherited RET CDS mutations. SNPs may modulate the penetrance or phenotype of a particular mutation through epistasis as exemplified by the exclusive action of the FGFR2 and MAP3K1 SNPs in BRCA2 mutation carriers (Antoniou et al. 2008).