Introduction

Genome-wide association studies (GWASs) have identified novel susceptibility genes for type 2 diabetes mellitus in Caucasians.1, 2, 3, 4, 5 TCF7L2, CDKAL1, CDKN2A/B, IGF2BP2, SLC30A8 and HHEX have been widely replicated as susceptibility genes for type 2 diabetes in Asian populations6, 7, 8, 9, 10, 11, 12 as well as in populations of European ancestry.13, 14 We recently identified KCNQ1 as a novel susceptibility gene, as well as seven other candidate susceptibility loci in a multistage GWAS for type 2 diabetes in the Japanese population, in which a total of 1612 cases and 1424 controls and 100 000 single nucleotide polymorphisms (SNPs) were included.15 KCNQ1 was found to confer risk of type 2 diabetes with a relatively large effect size in Asian populations (odds ratio (OR) for Japanese, Chinese and Korean individuals of 1.42),15 which was similar to that demonstrated earlier for TCF7L2 in the Japanese population.6

Follow-up of GWASs includes analysis of second-tier genes, meta-analysis for specific populations, as well as analysis of gene–gene or gene–environment interactions. A large-scale meta-analysis16 and an analysis of gene–gene interaction for susceptibility genes17 have been performed for type 2 diabetes in populations of European ancestry.

In this study, we attempted to confirm in independent subject panels of Japanese and Hong Kong Chinese individuals the associations of the seven candidate susceptibility loci that we identified in addition to KCNQ1 in our GWAS of type 2 diabetes.15 However, as described in this article, we failed to replicate the associations of the seven loci with diabetes. We then attempted to extract genes with strong evidence of the associations with diabetes, and selected 11 genes, including KCNQ1. As we did not detect any gene–gene interaction between the 11 genes, we then attempted to construct a prediction model for this disease by using the data from the 11 genes, as well as age, gender and body mass index (BMI) as independent variables to obtain a comprehensive understanding of the genetic background of diabetes in the Japanese population.

Materials and methods

Validation of the results from a multistage GWAS in the Japanese population

Study subjects

We assembled two independent subject panels for our replication study: replication-Japanese and replication-Chinese. The 1000 cases and 1000 controls for the replication-Japanese panel were recruited by the Study Group of the Millennium Genome Project for Diabetes Mellitus. The inclusion criteria for diabetic patients were (i) an age at disease onset of 30–60 years and (ii) the absence of antibodies to GAD. Types of diabetes other than type 2 were excluded on the basis of clinical data. The criteria for controls included (i) an age of >50 years, (ii) no past history of a diagnosis of diabetes and (iii) an HbA1c content of <5.8%.

For the replication-Chinese panel, subjects of southern Han Chinese ancestry, who resided in Hong Kong, were recruited. The cases consisted of 1416 individuals with type 2 diabetes selected from the Prince of Wales Hospital Diabetes Registry;5, 18 626 of these subjects had early-onset diabetes (age at diagnosis of <40 years) and a positive family history, whereas the remaining 790 patients were randomly selected from the registry. Patients with classic type 1 diabetes with acute ketotic presentation or a continuous requirement for insulin within 1 year of diagnosis were excluded. The controls consisted of 1577 subjects with normal glucose tolerance (fasting plasma glucose concentration of <6.1 mmol 1−1); 596 of these individuals were recruited either from the general population participating in a community-based screening program for cardiovascular risk or from hospital staff, whereas the remaining 981 subjects were recruited from a population-based screening program for cardiovascular risk in adolescents.19 The clinical characteristics of the subjects in each panel are summarized in Supplementary Table 1A. The study protocol was approved by the local ethics committee of each institution. Written informed consent was obtained from each subject.

Study design and statistical analysis

For the validation of the results from our earlier multistage GWAS,15 seven SNPs (rs2250402, rs2307027, rs3741872, rs574628, rs2233647, rs3785233 and rs2075931) were genotyped in the two panels either by sequence-specific primer–PCR analysis followed by fluorescence correlation spectroscopy20 or by real-time PCR analysis with TaqMan probes (Applied Biosystems, Foster City, CA, USA). Differences in allele frequency between cases and controls for each SNP were evaluated by χ2 with one degree of freedom. Meta-analysis was performed by the Mantel–Haenszel method (fixed-effects models) with the ‘meta’ package of the R-Project (http://www.r-project.org). A P-value of <0.05 was considered statistically significant.

Examination of gene–gene interaction and construction of a prediction model

Study subjects

In total, 2424 cases and 2424 controls of the Japanese population obtained by combining the second and third screening panels in our original study15 and the replication-Japanese panel of this study were included in this analysis (analysis-panel). The criteria for the second and third screening panels were described in the earlier report.15 The clinical characteristics of the subjects are summarized in Supplementary Table 1B.

Selection of the loci included in this study

Prediction of the phenotypes on the basis of genetic polymorphisms should include the genetic data from the loci with strong evidence of the association. Starting from 15 genes described in earlier reports, we selected 11 genes with strong evidence of the association on the basis of the data in the literature and on the results of the replication experiments in this study. Process of the selection of the 11 genes will be described in detail in Results.

Statistical methods

Multiplicative gene–gene interaction was evaluated for each pair of the 11 genes using an interaction term in addition to the terms for the pair of the genes in the logistic regression model. The genotypes for each locus were coded by 0, 1 and 2. Correction for multiple testing was performed by Bonferroni's method.

As there was no evidence for the presence of gene–gene interactions, we attempted to construct a phenotype prediction model by incorporating the number of risk alleles for the 11 loci as an independent variable in addition to age, gender and BMI. The Cochran–Armitage test was used to examine the trend of the increase in the odds by increasing the number of the risk alleles. To construct a prediction model, the log of odds was expressed by the linear combination of the independent variables. Coefficients for the variables were estimated by the logistic regression analysis after making disease (cases) or nondisease (controls) as the dependent variable. Using the coefficients estimated by the logistic regression analysis, we constructed a phenotype prediction model. To evaluate the prediction model, receiver operating characteristic (ROC) curves21 for the sensitivity and specificity of the prediction model with or without adjustment for age, sex and BMI were generated, and the area under the curve (AUC) was calculated from the ROC curve.

Results

Validation of the results from a multistage GWAS in the Japanese population

We identified earlier 10 loci associated with type 2 diabetes by three-staged GWAS starting from 100 000 SNPs. Among the 10 loci, 3 SNPs were located in an intron of KCNQ1, and the association of this gene with diabetes was confirmatory.15 To validate the other seven loci for the association with type 2 diabetes, we analyzed them in two independent replication panels of Japanese and Han-Chinese individuals (Table 1, Supplementary Table 2). Only one SNP, rs2250402, which is located in EIF2AK4, was found to be significantly associated in the replication-Japanese panel (P=0.039, OR=1.17, 95% CI=1.01–1.36). However, neither this SNP (P=0.41, OR=1.05) nor any of the other six SNPs showed such an association in the replication-Chinese panel. Meta-analyses for these SNPs showed that rs2307027 in KRT4 and rs3785233 in A2BP1 yielded P-values of <0.05 and ORs between 1.12 and 1.13 (Table 1). When the original second and third screening panels were included in the meta-analyses, these two loci, as well as the SNPs in EIF2AK4 (rs2250402) and FAM60A (rs3741872), gave P-values of <0.001 and ORs between 1.15 and 1.18 (Supplementary Table 3). However, the P-values did not reach the proposed significance of GWAS (=5 × 10–7).

Table 1 Association study for the candidate susceptibility genes for type 2 diabetes selected by multistage screening in the Japanese population

Selection of polymorphisms for the prediction model

To construct a reliable prediction model for diabetes, polymorphisms with strong evidence of association should be used. From the previous literature, we selected 15 genes (including one intergenic marker), that is, SLC30A8, HHEX, LOC387761, EXT2, CDKN2A/B, GCKR, IGF2BP2, CDKAL1, FTO,1, 2, 3, 4, 5 TCF7L2,22 KCNJ11,23 PPARG,24 WFS1,25 HNF1B26 and KCNQ1,15 as candidate genes to be included in both gene–gene interaction analysis and construction of a prediction model. Starting from 23 SNPs in these 15 genes, we selected 11 SNPs in 11 genes according to the following process. There is sufficient evidence of the associations of KCNQ1 and TCF7L2 genes with diabetes as supported by replication studies in the Japanese population.6, 15, 27 In addition, SLC30A8, HHEX, CDKN2A/B, IGF2BP2 and CDKAL1 associated with the disease in the European population were found in our earlier study to be associated with the disease in the Japanese population as well.7, 8, 9

To further extract genes with strong evidence of the association with diabetes, we attempted to replicate the associations reported earlier using our own data (analysis panel with 2424 cases and 2424 controls). For the 19 SNPs in SLC30A8, HHEX, LOC387761, EXT2, CDKN2A/B, GCKR, IGF2BP2, CDKAL1, FTO, TCF7L2, KCNJ11, PPARG and KCNQ1, we extracted genotyping data from our earlier studies6, 7, 8, 9, 15, 27, 28, 29 and, if necessary, genotyped additional subjects to obtain a data set for 2424 cases and 2424 controls of the Japanese population (analysis panel). The SNPs in WFS1 (rs6446482, rs734312) and HNF1B (rs7501939, rs4430796) were genotyped for this study in the same individuals. SNPs with P-values for the test of deviation from the Hardy–Weinberg equilibrium of <0.01 were excluded for further analysis. When two SNPs were located in the same genomic region, the one with the lower P-value for the association test was selected for further analysis. GCKR, for which we earlier reported the marginal association with type 2 diabetes,7 was found to be associated with the disease in this enlarged Japanese panel (P=1.7 × 10−5; Supplementary Table 4). KCNJ11 and PPARG, which have been included in the genes associated with diabetes in Caucasians, showed marginal associations (P=0.066 and P=0.075, respectively; Supplementary Table 4) in our panel. Two SNPs in WFS1 and two SNPs in HNF1B were newly genotyped in the analysis panel. Although no association was apparent between WFS1 and type 2 diabetes, both SNPs in HNF1B exhibited P-values of <0.05 (Supplementary Table 4). From these data, we included 11 SNPs in 11 genes as described above for the source of genotype data to be analyzed in both the examination of gene–gene interaction and the prediction of phenotypes.

Gene–gene interaction

We evaluated multiplicative gene–gene interaction for each pair of the 11 loci as described in Materials and methods. Two combinations, rs1801282 (PPARG) × rs1470579 (IGF2BP2) (nominal P=0.0025) and rs1801282 × rs3802177 (SLC30A8) (nominal P=0.018), showed P-values of less than 0.05 (Supplementary Figure 1). However, these P-values were not significant when Bonferroni's correction for multiple testing was applied (significance level, 0.05/55=9.1 × 10–4). Although PPARG and IGF2BP2 are located on the same chromosome (3p25 and 3q28, respectively), it is unlikely that loci on different arms of the same chromosome show significant linkage disequilibrium. SLC30A8 is located on a different chromosome (8q24.11) from PPARG. The reason why nominal P-values of these combinations showed less than 0.05 may be because of the low minor allele frequency of rs1801282.

Cumulative risk assessment for type 2 diabetes on the basis of susceptibility genes

As there was no evidence of gene–gene interaction between 11 SNPs of 11 genes, SLC30A8, HHEX, CDKN2A/B, GCKR, IGF2BP2, CDKAL1, TCF7L2, KCNJ11, PPARG, KCNQ1 and HNF1B, they were included in the prediction model as independent variables with the additive effect (additive effect in the liability and multiplicative effect in the odds) without interaction terms. Effective numbers of cases and controls whose genotypes for the 11 loci were successfully obtained were 2316 and 2370, respectively. The Cochran–Armitage trend test gave a P-value of 4.7 × 10–56 for the trend in the increase in the odds for cases relative to controls with an increasing number of risk alleles for the 11 susceptibility loci (Supplementary Table 5). We then estimated ORs for type 2 diabetes in subjects with different numbers of risk alleles on the basis of the multiplicative model by logistic regression analysis with adjustment for age, sex and BMI. The ORs for type 2 diabetes in subjects with 7–18 risk alleles in comparison with those harboring 0–6 risk alleles are shown in Figure 1. An increase of one risk allele resulted in an average increase in the odds of 1.29 (95% CI=1.25–1.33, P=5.4 × 10–53, logistic regression analysis).

Figure 1
figure 1

Odds ratios for subjects with different numbers of risk alleles for 11 susceptibility loci for type 2 diabetes. The cumulative effect of the 11 loci on type 2 diabetes was tested by counting the number of risk alleles associated with type 2 diabetes with a logistic regression model with adjustment for age, sex and BMI. The ORs for subjects with each number of risk alleles are expressed relative to individuals with 0–6 risk alleles.

To predict disease status for type 2 diabetes in a given individual, we constructed a prediction model on the basis of the number of risk alleles or the liability value calculated from the number of risk alleles as well as age, sex and BMI. The coefficients to calculate the liability value were estimated with the logistic regression model. To estimate the predictive power of the model, we generated ROC curves as described in Materials and methods. The AUC was 0.63 when only the number of risk alleles was used for the prediction. When age, sex and BMI were also included, the AUC increased to 0.72 (Figure 2). Meanwhile, an AUC value for the ROC curve based on only age, sex and BMI was 0.68, which was better than that based on only the number of risk alleles (data now shown). The model incorporating age, sex and BMI as well as the number of risk alleles thus showed moderate power for the prediction of type 2 diabetes. The best accuracy was 0.66 at the threshold between non-diabetic and diabetic status of 0.52 (non-diabetic status=0, diabetic status=1), for which the specificity and the sensitivity were 0.71 and 0.61, respectively.

Figure 2
figure 2

ROC curves for the prediction model on the basis of the number of risk alleles for 11 susceptibility loci for type 2 diabetes. The prediction model for type 2 diabetes was constructed using the logistic regression model, and ROC curves for the model were generated. In model 1, the number of risk alleles was used as an independent variable together with age, sex and BMI as covariates, whereas only the number of risk alleles was used as an independent variable in model 2.

Discussion

By the validation of the results from our multistage GWAS, we detected only marginal associations of EIF2AK4, KRT4 and A2BP1 with type 2 diabetes in meta-analyses with two subject panels of Japanese or Chinese individuals. Relations of KRT4 (keratin 4 gene) and A2BP1 (ataxin-2-binding protein 1 gene, also known as FOX1) to glucose or lipid metabolism are unknown. Deletion of EIF2AK4 (eukaryotic translation initiation factor 2 alpha kinase 4 gene, also known as GCN2) in mice resulted in liver steatosis during leucine deprivation as a result of unrepressed expression of lipogenic genes.30 The functionally related gene, EIF2AK3 (also known as PERK or PEK), has been shown to cause diabetes mellitus both in humans (Wolcott–Rallison syndrome, OMIM604032) and in rodent models.31, 32 Taken together, EIF2AK4 may be a good candidate for the diabetes susceptibility gene. The sample size required for a statistical power of 0.80 with equal numbers of cases and controls is 10 505 when the frequency of the risk allele, OR and type I error probability are assumed to be 0.20, 1.10 (the value for EIF2AK4 in the meta-analysis in Table 1) and 0.05, respectively. Further studies of these genes in other Asian populations as well as in other ethnic groups are needed for confirmation of their association with type 2 diabetes. Given this uncertainty, we did not include these genes in the assessments of cumulative risk and gene–gene interaction.

Among tens of type 2 diabetes susceptibility genes identified by recent GWASs in Caucasians, the associations of six genes, that is, TCF7L2, CDKAL1, CDKN2A/B, IGF2BP2, SLC30A8 and HHEX, have been replicated in Asian populations as well as in populations of European ancestry. A recent meta-analysis in Japanese subjects also supported the associations.12 In this study, we performed replication study, and, on the basis of the results, we added five more genes, that is, KCNJ11, PPARG, GCKR, KCNQ1 and HNF1B, for the cumulative risk assessment for type 2 diabetes. Thus, the SNPs of HNF1B, which were earlier associated with type 2 diabetes in Chinese as well as in Caucasians,26 showed the association with the disease in the Japanese population in this study. In addition, the C allele of rs780094 in GCKR was associated with increased risk of type 2 diabetes in this study, which is consistent with a recent study in Caucasians.33 The associations of KCNJ11 and PPARG with diabetes were marginal in this study; however, they were included for the prediction model, as the associations were replicated in some studies of Caucasians.

Our gene–gene interaction analysis showed no significant interaction for any of the 55 possible pairs of genes when corrected for multiple testing. When the significance level was set at 0.05, two pairs were judged to be significant. However, such gene–gene interactions were not supported from the functional point of view. A large-scale study may provide more convincing evidence for such interactions.

As no confirmatory evidence for gene–gene interaction was observed, we treated the 11 genes as independent variables in the prediction model. The addition of one risk allele was estimated to increase the odds by an average of 1.29 according to the multiplicative model. This value is similar to that (1.24) estimated for type 2 diabetes in Caucasians.17 Two earlier cumulative risk assessments for type 2 diabetes in Asian populations with relatively small numbers of associated loci yielded values of 1.17 and 1.24 for the fold increase in risk for each additional risk allele.11, 34 In our prediction model for type 2 diabetes, the AUC for the ROC curve was lower than that in the earlier study17 based on 15 loci in Caucasians (0.72 and 0.86, respectively). However, the number of loci in our study (11 loci) was lower than that in the study for Caucasians. The inclusion of additional loci in our model should improve its ability to predict type 2 diabetes in Asian populations. Several reports of the prediction of type 2 diabetes using 18 loci were recently described for populations of European ancestry.35, 36, 37, 38 A prediction based on 18 loci gave an AUC value of 0.80 for the ROC curve,35 whereas the corresponding values for a population-based prospective study were 0.68,36 0.61537 and 0.75.38 They concluded that genetic variations associated with diabetes had a small effect on the ability to predict the development of type 2 diabetes as compared with clinical characteristics alone. In fact, the AUC value (0.72) based on both the genetic variations and the clinical characteristics was slightly better than that based on only the clinical characteristics (0.68). We admit that the evidence of the association with diabetes is a little weaker for KCNJ11 and PPARG in the Japanese population than for the other nine genes. If KCNJ11 and PPARG were excluded from the analysis, the AUC for the ROC curve in the prediction model incorporating age, sex and BMI remained unchanged at 0.72, probably because of the relatively large effects of KCNQ1 and TCF7L2.

Finally, our prediction model for type 2 diabetes achieved limited success even though it has some value. Given that GWASs for diabetes in Asians have not been as extensive as those in Caucasians, many risk loci for diabetes in Asians remain most likely to be undiscovered. Considering that the average increase in OR conferred by each additional risk allele was similar between Caucasians and Japanese, incorporation of data from additional risk loci is most likely to increase the predictive power.