Introduction

As one type of the glutamate receptors in the glutamatergic excitatory neurotransmitter system, N-methyl-D-aspartate (NMDA) receptors play crucial roles in excitatory synaptic transmission, plasticity and excitoxicity. Functional NMDA receptors are mainly composed of an NMDA receptor NR1 subunit (GRIN1) and one of the four NMDA receptor NR2 subunits (GRIN2A, GRIN2B, GRIN2C and GRIN2D).1, 2, 3 For several years, there has been considerable interest in the possible role of the NMDA receptors in schizophrenia following the discovery that several noncompetitive antagonists of the NMDA receptors, such as phencyclidine (PCP), ketamine and MK-801, were found to cause psychotic conditions mimicking schizophrenia in humans and laboratory animal models.4, 5, 6, 7, 8 The involvement of the NMDA receptors is also supported by genetic animal models: for instance, knockdown mice expressing only 5% of normal levels of the essential GRIN1 subunit survive to adulthood but display behavioral abnormalities, such as elevated motor activity and stereotypy and deficiencies in social and sexual interactions, which can be ameliorated by treatment with haloperidol or clozapine.9, 10, 11, 12, 13, 14, 15

Mutation screening and association analysis for the NMDA receptors have been performed by several groups focusing mainly on the GRIN1 and GRIN2B genes.16, 17, 18, 19, 20, 21, 22, 23 The association of the variants in GRIN1 and GRIN2B with schizophrenia has been established in several reports,18, 19 but not replicated in others.16, 22 Until now, GRIN1 and GRIN2B have always been studied individually. However, the functional NMDA receptors are heterodimer complexes that involve gene–gene interactions arising from interactions between receptor subunits and probably have a synergistic effect on the pathogenesis of schizophrenia. There has been accumulating evidence supporting the hypothesis that gene to gene interactions play an important role in susceptibility to common human diseases.24 In the light of this evidence, the present study has focused on the GRIN1 and GRIN2B genes as a pilot study of the receptor.

DNA microarray technology has been advocated as one of the most powerful approaches for highly parallel, large-scale SNP analysis. Several approaches for SNP genotyping with DNA microarray have been designed.25, 26, 27, 28, 29 Recently, the universal DNA microarray method mediated by ligase detection reaction (LDR) has been demonstrated to be a highly specific and sensitive assay for SNP detection.29, 30, 31, 32 We combined this approach with the slide self-activating method28, 33 to establish a high-discrimination and cost-effective technology with significant promise for future high-throughput SNP genotyping.

In the present study, we first examined the SNPs identified in previous studies across the GRIN1 and GRIN2B genes in the Chinese population, and then performed SNP genotyping using the universal microarray method to evaluate the single locus and haplotype associations. The gene–gene interactions were examined with the multifactor-dimensionality reduction method (MDR), which can identify evidence for high-order gene to gene interactions in the absence of any statistically significant independent main effects with relatively small samples.34, 35, 36

Materials and methods

Subjects

All subjects were Han Chinese in origin. The 253 patients with schizophrenia (mean age of onset 25.45 years, mean age at present 38.43 years, SD=10.9) were diagnosed according to DSM-IV by two independent clinicians. The 140 unrelated healthy Han Chinese (mean age 43.37 years, SD=12.05) served as controls. The participants recruited in this study all provided written informed consent.

DNA amplification and sequencing for SNP examination

We examined 11 SNPs (C112T, C113T, G716A, A750G, G1001C, A290G, G301A, G308A, A1970G, G2108A and G6435A) in the GRIN1 gene and five SNPs (G366C, C2664T, C3538T, T4197C and T5988C) in the GRIN2B gene, which had been identified in previous studies.16, 18, 20 The primer sets were designed to amplify the SNP-containing regions. The NCBI number of the SNP sites, corresponding primer sequences and the sizes of the amplicons are presented in Supplementary material 1. The PCRs were carried out on the PTC-100 DNA engine (MJ Research, USA) in a total volume of 15 μl containing 10 ng of genomic DNA, 10 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5–3.0 mM MgCl2, 200 μ M dNTP, 1 μ M of each primer and 0.25 U Taq DNA polymerase. The cycling protocol consisted of denaturation at 95°C for 1 min, followed by 30–35 cycles at 95°C for 30 s, 50–65°C for 1 min, 72°C for 1 min, and a final extension at 72°C for 10 min. Subsequently, the PCR products were sequenced for SNP examination using ABI PRISM 3100 DNA Sequencer (Applied Biosystems, USA). For each SNP, 40 subjects were examined in both sense and antisense orientations.

Genotyping with microarray procedure

In the universal microarray approach, five probes including two discriminating probes, one common probe and two corresponding zip-codes, which are coupled to the slides at known locations, were designed for each SNP as described in a previous study.29 In addition, the positive and negative marker probes were designed for result evaluation. The designed probes are summarized in Supplementary material 1. The discriminating probe contains a zip-code complement sequence on the 5′-end and the discriminating base on the 3′-end. The common probe is phosphorylated on the 5′-end and contains a fluorescent label (Cy3) on the 3′-end. The two probes are hybridized adjacent to one another on the template DNA and the nick between the two probes can be sealed by the ligase only if there is perfect complementarity at the junction. The ligated products are hybridized to the zip-codes array. Only the spots corresponding to the SNP type are seen to give stronger fluorescent signals (Figure 1).

Figure 1
figure 1

Strategy for the universal microarray mediated by ligase detection reaction. The discriminating probe containing zip-code complement sequence (czipa or czipb) and the common probe containing a fluorescent label can be ligated only if there is perfect complementarity at the junction. When hybridized to the universal array containing the zip-codes (zipa and zipb), the correspondent spots can be seen to give a stronger signal.

The standard microscope glass slides were first precleaned with 1 M NaOH overnight and washed several times with distilled water. After a neutralization step in 1% HCl and an extensive washing step in distilled water, the pretreated glass slides were immersed in 2% 3-aminopropyl-trimethoxysilane (Sigma, USA) for 15 min followed by five washing steps in acetone. Subsequently, slides were baked at 110°C for 45 min, and then the slide surface was activated with 1,4-phenylene diisothiocyanate (Sigma, USA) according to Guo et al33 and Beier and Hoheisel.37 The 5′-terminal amino-modified zip-codes were dissolved in a 400 mM sodium carbonate buffer (pH 9.0) to a final concentration of 20 μ M and spotted using a GMS 417 Arrayer (Genetic MicroSystems, USA) under a designed program. The slides were deactivated in 10% ammonium hydroxide solution. Finally, they were rinsed five times in distilled water and centrifuged for 1 min at 1000 r.p.m.

The target DNA sequences were amplified using a multiplex PCR method. After the completion of the amplification, 1 μl of Proteinase K (20 mg/ml) was added, then heated at 70°C for 10 min and quenched at 94°C for 15 min. The ligation reaction for each subject was carried out in a final volume of 20 μl containing 20 mM Tris-HCl (pH 7.6), 25 mM potassium acetate, 10 mM magnesium acetate, 10 mM DTT, 1 mM NAD, 0.1% Triton X-100, 10 μl of Multi-PCR product, 1 pmol of each discriminating oligo, 1 pmol of each common probe and 0.5 μl of 40 U/μl Taq DNA ligase (New England Biolabs, USA). The LDR was performed with 40 cycles of 94°C for 30 s and 63°C for 4 min.

Following the LDR reaction, the solution was diluted to obtain 65 μl of hybridization mixture containing 5 × SSC and 0.1 mg/ml salmon sperm DNA. The mixture was then quenched at 94°C for 2 min and chilled on ice immediately. The area of the slide containing the oligo array was sealed with Gene Frame slide chambers (ABgene, UK) and the mixture was then introduced. Hybridization was carried out at 65°C for 3 h. After removal of the chamber, the slide was washed for 2 min in 1 × SSC, 0.1% SDS and then for 1 min in 0.05 × SSC by gently shaking at room temperature. Finally, the slide was spun at 1000 r.p.m. for 4 min and imaged on a GMS418 scanner (Genetic MicroSystems, USA). 10 μm resolution of 16-bit TIFF images was analyzed using the Genepix 2.0 software (Axon Instruments, USA). The average signals after subtraction of local background were used for calculating the signal ratios of spot pairs corresponding to different alleles.

Statistical analyses

Haplotypes were generated using the program Arlequin (http://anthropologie.unige.ch/arlequin). Haplotype frequencies, linkage disequilibrium and Hardy–Weinberg equilibrium were performed by Arlequin. Linkage disequilibrium was measured between each of two locus and D′ was the normalized linkage disequilibrim statistics.38 Differences in allelic and separate haplotypic distributions were estimated by χ2 test using EpiInfo 6.0 and the phasing of case and control haplotype was performed separately. The limit of significance was set to 0.05. The odds ratios (OR) and their 95% confidence intervals were estimated by EpiInfo 6.0 for the effects of alleles. Difference in overall haplotype distribution was assessed by χ2 test using CLUMP1.639 with 300 000 simulations.

The gene–gene interactions were analyzed by MDR software. There are six general steps involved in implementing the MDR method for case–control study. The first step of MDR involves partitioning the data into some number of equal parts for cross-validation. With 10-fold cross-validation in our study, the data are randomly divided into 10 equal parts, and the MDR model is developed on 9/10 of the data (training set) and then tested on 1/10 of the remaining data (testing set). In step 2, a set of n genetic factors is selected from the pool of all factors. In step 3, the n factors and their multifactor classes or cells are represented in n-dimensional space. For example, for two polymorphisms, each with three genotypes, there are nine two-locus genotype combinations. Then, the ratio of the number of cases to the number of controls is evaluated within each multifactor cell. In step 4, each multifactor cell in n-dimensional space is labelled as high risk if the ratio of cases to controls meets or exceeds the threshold (T) of 1.0 and low risk if the threshold is not exceeded. In one recent report, it has been observed in simulation studies that a threshold of 1.0 is optimal at this stage of MDR modelling for unbalanced data sets.40 In this way, a model for cases and controls is formed by pooling those cells labelled high risk into one group and those cells labelled low risk into another group. This reduces the n-dimensional model to one dimension. In step 5, all possible combinations of n factors are evaluated sequentially for their ability to classify affected and unaffected individuals in the training set and the best n-factor model is selected. In step 6, the independent test set from the cross-validation is used to estimate the prediction error of the best model selected in step five. The prediction error is the proportion of subjects for whom an incorrect prediction was made. Steps 1 to 6 are repeated 10 times with the data split into 10 different training and testing sets. The number of times a particular set of factors is identified in each possible 9/10 of the subjects is defined as cross-validation consistency.

Once MDR identifies the best combination of factors, the final step is to determine which multifactor levels (eg genotypes) are high risk and which are low risk using the entire data set. This final evaluation is conducted with a threshold ratio that is determined by the ratio of the number of affected individuals in the dataset divided by the number of unaffected individuals in the dataset. The previous user-defined threshold ratio (T) is not used in this final evaluation. Also, the MDR program can display the number of cases and controls for each multilocus–genotype combination. To make it clear, we showed this as the percentage of it in the total subjects. Since the MDR program randomly shuffles the order of the observations in the dataset using a random seed supplied by the user before dividing the data into a training set and a testing set, we ran the analysis 10 times using 10 different random number seeds and the results were averaged in order to avoid spurious results due to chance divisions of the data.

This MDR procedure can be carried out for each possible model size. In this study, we considered two-locus interactions through four-locus interactions. Single best models are selected from among each of the two-factor, three-factor and four-factor combinations. Among this set of best multifactor models, the combination of factors that minimizes the prediction error and maximizes the cross-validation consistency is selected as the final best one. Finally, the statistical significance of the result was determined by comparing the average prediction error from the observed data with the distribution of average prediction errors under the null hypothesis of no associations derived empirically from 1000 permutations. Our permutation testing strategy was to randomize the case and control labels in the original dataset 1000 times to create a set of permuted datasets. MDR can then be run on each permuted data set and the maximum cross-validation consistency and minimum prediction error identified for each were saved and used to create an empirical distribution for estimation of a P-value The null hypothesis was rejected when the upper-tail Monte Carlo P-value derived from the permutation test was ⩽0.05.

Results

After the genotyping procedure, it was possible to validate differences in sensitivity and specificity by direct sequencing. We randomly selected 40 samples to be genotyped using both methods. The result showed that the two methods were completely consistent. It is clear that this microarray procedure demonstrates high specificity in relation to genotype (Figure 2). Moreover, some regions that were difficult to sequence could be genotyped easily by the microarray procedure. The DNA targets quality required in this microarray procedure was not so strict as in the sequencing method. As shown in Figure 2, the same quality of the DNA target, which was genotyped only approximately by sequencing, could be discriminated clearly by the microarray procedure.

Figure 2
figure 2

Characteristics evaluation of the DNA microarray procedure. Each column of the array corresponds to a polymorphism allele and contains five-spot replicates. Deposition scheme: 1 – 1001G; 2 – 1001C; 3 – 366G; 4 – 366C; 5 – 2664C; 6 – 2664T; 7 – 4197T; 8 – 4197C; 9 – 5988T; 10 – 5988C; 11 – negative marker; 12 – positive marker. (a), (b) and (c) are the scanning results that, respectively, indicate that the five polymorphisms are all heterozygous, homogeneous and miscellaneous. (d) and (e) are the genotyping result comparisons between the microarrray and the sequencing method with the same quality template. The marked part of the picture is the genotyping result of the C2664T locus of the same sample.

In the present study, 16 reported polymorphisms (11 in the GRIN1 gene and 5 in the GRIN2B gene) were selected for examination in the Chinese population containing 20 patients and 20 controls. One variant in the GRIN1 gene (G1001C) and four variants in the GRIN2B gene (G366C, C2664T, T4197C and T5988C) were observed. Three of them (G366C, C2664T and T4197C) were in coding sequence, but were synonymous mutations. The five identified SNPs were genotyped in 253 Chinese patients with schizophrenia and 140 Chinese control subjects. The distribution of these five SNPs was in Hardy–Weinberg equilibrium (data not shown). The data for allele frequencies are shown in Table 1. No significant differences in the allele frequencies were observed between the patients and controls. Linkage disequilibrium between each pair of GRIN2B SNPs is shown in Table 2. Three of the alleles in strong LD (C2664T, T4197C and T5988C) were used to generate haplotypes. The haplotype frequencies of the GRIN2B gene are shown in Table 3. We found that the frequency of the CCT haplotype (the C allele of the C2664T, the C allele of the T4197C, and the T allele of the T5988C) was higher in controls (0.089) than in patients (0.022) (P=0.000146 after correction by the number of tests performed, χ2=18.79, OR=0.23). The difference of CCC was nominally significant estimated by the χ2 test (P=0.02, χ2=5.40, OR=1.54) using EpiInfo 6.0 and was not significant after the correction by the number of tests performed (P=0.2). The overall haplotype distribution was significantly different between cases and controls (P=0.000031, χ2=23.60). With the MDR method, the gene–gene interactions were examined, and the results for each number of factors considered are summarized in Table 4. All the models had the same cross-validation consistency of 10.0 and the permutation testing showed that the prediction errors of the two-locus and three-locus models are both significant at the 0.001 level. Thus, under the null hypothesis of no association, it is highly unlikely that a prediction error ⩾30.772% will be observed for the two-locus model and a prediction error ⩾30.754% will be observed for the three-locus model. As the MDR procedure prefers the best model that has the minimum prediction error, the three-locus model, which included the G1001C polymorphism in the GRIN1 gene and the T4197C and T5899C polymorphisms in the GRIN2B gene, was regarded as the best model. Table 5 summarizes the three-locus–genotype combinations associated with high risk and with low risk, along with the corresponding distribution of cases and of controls, for each multilocus–genotype combination.

Table 1 Allele distribution of the five polymorphisms in the GRIN1 and GRIN2B genes
Table 2 Estimation of linkage disequilibrium of the four GRIN2B SNPs
Table 3 Frequencies of estimated haplotypes composed of the three polymorphisms of the GRIN2B gene
Table 4 Results of the MDR analysis of the dataset
Table 5 Distribution of high-risk and low-risk genotypes in the best three-locus model

Discussion

The microarray procedure used in this study was demonstrated to be reliable by comparing with sequencing method in the examination of 40 subjects. The universal microarray platform can be used for any group of SNPs and only the LDR primers require to be designed. Compared with other gene-specific arrays, the microarray procedure gives researchers freedom to detect specified polymorphic loci based on the same universal array and avoids the need to optimize the operation condition each time. Furthermore, this approach can readily distinguish the polymorphism types of insertions and deletions in repeat sequence.29

On examination of the SNPs, only one of the 11 reported SNPs existed in the GRIN1 gene in our sample, which is perhaps due to ethnic differences. The lower rate of nonsynonymous SNPs in this gene, which has also been reported by other groups,16, 17 may be mainly due to selection acting against molecular diversity during human evolution. Of all the identified variants in both GRIN1 and GRIN2B genes, three are located in protein coding sequences, but only as synonymous mutations. Moreover, one polymorphism G1001C seemed to have functional significance. It is located in the promoter region of GRIN1 and could alter a consensus sequence for the p50 subunit of the transcription factor NF-κB.19 The association analysis of G1001C polymorphism in the GRIN1 gene has been done by several groups,19, 21, 22, 23 but only Begni et al19 detected a significant result in the Italian population. In the previous association study for the four polymorphisms (G366C, C2664T, T4197C and T5988C) in the GRIN2B gene, Ohtsuki et al18 observed a significant difference in the allele frequency of the G366C polymorphism between patients and control individuals in the Japanese population. Also, their haplotype analysis showed a borderline significant result for the overall haplotype distribution.18 However, the significant result of G366C was not replicated in the analysis on a sample of Caucasian patients.16 In our study, no significant frequency differences of the five alleles (G1001C, G366C, C2664T, T4197C, and T5988C) between schizophrenic patients and controls were observed. However, in the low-frequency haplotype (CCT) and the overall haplotype distribution of the GRIN2B gene, we did find evidence of association with schizophrenia. Moreover, all the three SNPs contained in the haplotype of our study are also included in the significant haplotype of the previous study by Ohtsuki et al, which further supported possible association of the haplotype in GRIN2B gene with schizophrenia. Since this association has been found in a relatively small sample size (253 cases and 140 controls) involving a low-frequency haplotype, it is susceptible to a false-positive result and further study with a large sample size is required.

There is a growing awareness that the complex disorders may be genetically attributable to complex interactions among multiple genes. Many studies have observed this combination effect in a number of complex disorders.34, 40, 41, 42, 43, 44 The two subunits of the NMDA receptors have different functions. The GRIN1 subunit possesses the characteristic ion channel properties and the specificity of the NMDA receptors arises from the GRIN2 subunits, which display different regional and temporal expression patterns.1, 2, 3 We hypothesized that gene–gene interactions across the two subunit genes probably contributes to the pathogenesis of schizophrenia although the specific SNPs alone do not show any association with the disorder. Using the MDR method, we were able to detect interactions in the absence of main effects, where logistic regression and other approaches may not be able to do so.34 Application of MDR to case–control data sets has routinely yielded statistically significant interactions in various common human diseases, such as sporadic breast cancer, essential hypertension, atrial fibrillation and type II diabetes mellitus.34, 40, 43, 44 In our MDR analysis, although the prediction errors of the two-locus and three-locus models had the same level of significance, the three-locus model had a lower prediction error (30.754%) than that in the two-locus model. Furthermore, the three loci in the three-locus model were located within the two different subunit genes, which may display the gene to gene interactions associated with schizophrenia and the two-locus model are included in the three-locus model. Thus, we selected the three-locus model as the best model. However, the MDR program could not statistically compare the two models to confirm that the three-locus model is more significantly associated than the two-locus model, which is a limitation of the present study. The best model showed a genetic interaction between the G1001C in the GRIN1 gene and the T4197C and T5988C polymorphisms in the GRIN2B gene. To our knowledge, this is the first report of the interaction between the two subunit genes of the NMDA receptors associated with schizophrenia.

In conclusion, using the universal microarray procedure mediated by LDR combined with effective MDR statistical analysis, our study provided the possibility that the combinational effects of the polymorphisms in the GRIN1 and GRIN2B genes are associated with schizophrenia in the Chinese population. However, as case–control studies are susceptible to positive and negative artefacts from unknown population stratification, it is necessary to validate or replicate our association results in other ethnic groups.