Introduction

Type 2 diabetes mellitus (DM) affects more than one hundred million individuals worldwide (Zimmet et al. 2001). Its pathogenesis appears to be the consequence of insulin resistance in peripheral tissues combined with dysfunction of β cells in pancreatic islets although the precise mechanism is still not well known (Kahn 1998; Saltiel 2001).

That genetic factors contribute to the onset and progression of DM is undoubted, and several genes responsible for specific forms of the disease, such as maturity-onset diabetes of the young (MODY) or mitochondrial diabetes, have been identified (Fajans et al. 2001; Kadowaki et al. 1994). However, genetic alterations associated with these specific forms of diabetes account for only a small subset of cases; the gene or genes conferring susceptibility to type 2 diabetes in most patients remain to be identified.

Worldwide efforts to sequence the entire human genome have established a nearly complete blueprint (International Human Genome Sequencing Consortium 2001), providing a large body of information regarding genes whether their functions are already known or not. Single nucleotide polymorphisms (SNPs), the type of genetic variation found most frequently throughout the sequenced genome, have become useful markers for identifying genes involved in common diseases, such as DM. We developed a high-throughput SNP genotyping system that combined the Invader assay with multiplex polymerase chain reactions (PCRs) (Ohnishi et al. 2001) and undertook genome-wide association studies using SNPs to discover loci involved in susceptibility to common diseases.

In the study presented here, we show the results of a large-scale, case-control study using nearly 60,000 gene-based SNPs as genetic markers and provide the evidence that the gene encoding TFAP2B at chromosome 6p12 might be a novel candidate conferring susceptibility to type 2 diabetes.

Subjects and methods

Subjects and DNA preparations

DNA samples were obtained from patients with type 2 diabetes who come regularly to the outpatient clinics of Shiga University of Medical Science, Tokyo Women’s Medical University, Juntendo University, Kawasaki Medical School, Keio University School of Medicine or Iwate Medical University. Control individuals consisted of 470 members of the general population (control 1) and another set of the general population (control 2, n=889) who were recruited through several medical institutes in Japan. We also used a third set of control subjects with normal plasma glucose levels (HbA1c < 5.5% or fasting plasma glucose < 100 mg/dl and no family history for diabetes, control 3, n=598), for final analysis. Written informed consent was obtained from each patient, and DNA extraction was performed using a standard phenol-chloroform procedure. The UK samples comprised 590 cases with type 2 diabetes enriched for positive family history (probands from the Diabetes UK Warren 2 repository) (Wiltshire et al. 2001) and 549 UK population controls (the ECACC-HRC collection) (Groves et al. 2003).

Genotyping for gene-based SNPs

The SNPs for genotyping experiments were selected randomly from the IMS-JST Japanese SNPs database (http://snp.ims.u-tokyo.ac.jp) (Hirakawa et al. 2002; Haga et al. 2002). The genotype at each SNP locus was analyzed with the Invader assay, as previously described (Ohnishi et al. 2001). We screened 188 diabetic patients at first, and genotype and/or allele frequencies were compared with those of the general population. After evaluating the statistical data using 2×3 or 2×2 contingency tables, SNPs that showed significant differences in genotype or allele frequencies between diabetic patients and the general population were examined further in another larger set of diabetics (n=631). The protocol was approved by the ethics committee of the Institute of Physical and Chemical Research.

Discovery of SNPs in the TFAP2B gene, and genotyping

On the basis of GenBank information about DNA sequences in the genomic region containing the TFAP2B gene (accession number: NT_007592), we designed PCR primers to amplify appropriate fragments of genomic DNA. Repetitive elements were excluded from the search by invoking the REPEAT MASKER computer program, in the manner described previously (Seki et al. 2000). PCR reactions and DNA sequencing were carried out, as previously described (Saito et al. 2001). The SNPs in this region were genotyped by means of Invader (Japanese samples) or Amplifluor assays (UK samples) (Bengra et al. 2002), and VNTR loci were analyzed with respect to allele sizes using the Applied Biosystems ABI PRISM 3700 Automated DNA Sequencer and GeneScan software (GenoTyper program).

Reverse transcription and polymerase chain reactions

First-strand cDNA was prepared by reverse transcription (RT) of total RNA extracted from the murine 3T3-L1 cells by oligo-dT priming using Superscript II reverse transcriptase (Invitrogen). Human cDNAs from multiple tissues were obtained from CLONTECH Inc. (Palo Allto, CA, USA). The first-strand or double-strand cDNAs were amplified by PCR experiments using primers mAP2RT-F (5′-GCG TCC TCA GAA GAG CCA AAT C-3′) and mAP2RT-R (5′-GTG CGT GAT GAG ACT GAA GTG C-3′) for murine TFAP2B and hAP2RT-F (5′-CCA AAT CTG TGA CTT CTC TAA TGA-3′) and hAP2RT-R (5′-GTA ACG TGA CAT TTG CTG CTT TG-3′) for human TFAP2B. Real time quantitative RT-PCR for human TFAP2B was performed by TaqMan assay using primers hAP2BTM-F (5′-TTG AAC CGG CAG CAC ACA-3′), hAP2BTM-R (5′-CTT GGT GGC CAA CAG CAT ATT-3′) and probe hAP2BTM-P (5′(FAM)-CCG AGT GAC CTG CAC TCC CGA AA-(TAMRA)3′).

Statistical analysis

Statistical methods for determining associations, haplotype frequencies and to calculate linkage disequilibrium (LD) coefficients (Δ) were described previously (Yamada et al. 2001). Analysis of haplotype structure was carried out by estimating haplotype phasing using the EM algorithm (Excoffier and Slatkin 1995) and by constructing haplotype blocks, as previously described (Ozaki et al. 2002; Daly et al.2001).

For the simulation approach, to calculate the actual type 1 error rate in our study, we simulated exactly the process of the third tests (first test: case 1 versus control 1; second test: case 2 versus control 1; third test: case 2 versus control 2) using the Monte-Carlo method. Since the two sets of cases (cases 1 and 2) and the two sets of controls (controls 1 and 2) were all collected independently, two alleles were independently drawn for each subject, assuming that the frequency of the minor allele was the same for all the four groups. This means that the simulations were performed under the null hypothesis. For the simulations, the frequency of the minor alleles was changed from 0.02 to 0.5. After the genotypes of all subjects were determined, the cases and controls were compared in four different ways (first and second tests). Thus, (1) the differences in allele frequencies were tested using the allele frequency 2×2 contingency tables; (2) the differences in the frequencies of the subjects with minor alleles were then tested using the 2×2 contingency tables; (3) the differences in the frequencies of the subjects with major alleles were tested; and (4) the frequencies of three genotypes were tested using 3×2 contingency tables. All tests were done using the Pearson’s chi-square test. The Monte-Carlo simulation was performed using Mersenne Twister uniform pseudo-random number generator (Matsumoto and Nishimura 1998).

For step-wise logistic regression analysis, the probability Pc of an individual of being a case rather than a control was assumed to be affected by a set of SNPs according to the logistic model: logit(Pc)=a0 + a1 x1 + a2 x2 for single SNP, for example. Here, we used a coding scheme x1=−1, 0, 1, and x2=−0.5, 0.5, −0.5 for genotypes 1/1, 1/2 and 2/2, respectively, for representing an additive effect by x1 and a dominance/recessive effect by x2 (Cordell and Clayton 2002). The weights were estimated by the maximum-likelihood method and tested by comparison with the null-hypothesis logit (Pc)=a0 (constant). For multiple SNPs, interaction effects were added further in addition to the main effects of additional SNPs and tested step-wise whether their effects were significant or not. The tests were performed using R. We applied both strategies of forward selection (starting from one SNP) and backward selection (starting from all SNPs) until the most significant SNP set were obtained (Cordell and Clayton 2002).

Results

Association study

We first genotyped 188 Japanese patients with type 2 diabetes (case 1) at 58,266 SNP loci and compared their allelic or genotype frequencies at these loci with those in the general population (control 1) (first test). At each locus, we tested the differences between the two populations by the four ways described in the Methods section; 1,496 SNP loci revealed a P value of <0.01 by at least one of the four tests (Table 1). We then analyzed these 1,496 loci for the second test using another larger group of patients (case 2). When case 2 and control 1 were compared by the four ways described in the Methods section, we found that the distribution of genotypes at a landmark SNP locus in the second intron of the TFAP2B gene on chromosome 6p12 was most strongly associated with type 2 diabetes [GG versus GC+CC: χ2=15.9, P=0.00007, odds ratio=1.65, 95% CI 1.29–2.11, G versus C: χ2=11.3, P=0.0007, odds ratio=1.38, 95% CI 1.14–1.66, (Tables 2 and 3)]. Furthermore, we compared the frequency of the alleles at this locus in type 2 diabetic subjects (case 2) with that in a different set of controls (control 2) (third test) and identified significant association with type 2 diabetes (GG versus GC+CC: P=0.04, odds ratio=1.21, 95% CI: 1.00–1.48, G versus C: P=0.03, odds ratio=1.18, 95% CI: 1.01–1.37).

Table 1 Distribution of P values in the first testa
Table 2 Distribution of P values in the second testa
Table 3 Association of landmark SNP in TFAP2B (intron 2+58 G/C) during genome-wide screeninga

We further examined the difference in the allele frequencies at this locus using 349 cases (case 3) and 598 controls (control 3), both of which were unused for the first three tests (fourth test) and confirmed a significant difference with the P value of 0.03.

Since the above three tests (first through third) used the overlapped materials, the P values from the three tests might not be correct. Therefore, the overall empirical type I error rate of these three tests was calculated by the simulation, as described in the Methods section. The simulation was iterated two hundred million times for each given minor allele frequency. As shown in Fig. 1, the type I error rate increased according to the increase of the minor allele frequency from 0.02 to 0.5. Under all conditions tested, the upper limits of the 95% CI of the type I error rates were lower than 1.61×10−5 (Fig. 1). This empirical P value obtained by the simulations was multiplied by the P value for the fourth test (P=0.03) because the later test was independent from the former. The resulting P value of <4.67×10−7 is considered to be the probability of the SNP association to pass the four tests. When we test 58,266 SNPs in this way, the probability to judge at least one of the SNPs to be significant was P<0.028, as calculated by Bonferroni’s correction for the multiple comparisons.

Fig. 1
figure 1

Empirical type I error rate of the first through third tests. For comparison between cases and controls, these tests were simulated by the Monte-Carlo method. The details were described in the Methods section. The simulation was iterated two hundred million times for each given minor allele frequency, and the mean and 95% CI were indicated for each minor allele frequency.

Thus, the difference in the minor allele frequencies of the selected SNPs between cases and controls is significant at the type I error rate of lower than 0.05.

Subsequent LD mapping of this region using 26 SNPs around the landmark SNP in the TFAP2B gene revealed that the LD of this region seemed to extend to an approximately 300-kb region (200 kb upstream and 100 kb downstream to the landmark SNP). Therefore, we thought the critical region for susceptibility to type 2 diabetes lay within this 300-kb block that contained three genes (two confirmed and one predicted), in addition to TFAP2B. We further genotyped 188 patients for 33 additional SNP loci present within these three genes but found no significant association between any of the 33 SNPs and type 2 diabetes (P>0.05, data not shown), suggesting the TFAP2B gene itself to be the most likely candidate for susceptibility to type 2 diabetes.

SNP discovery in TFAP2B gene and genotyping

We screened genetic polymorphisms in an entire region of the TFAP2B gene except repetitive sequences and found 40 additional variations, including 28 SNPs, eight insertion/deletion polymorphisms and four tandem-repeat polymorphisms although no SNP was found in the coding region of the TFAP2B gene. We then genotyped these polymorphisms for 349 patients [case 3, lean subjects (BMI<25) were not included in case 3] and 598 healthy controls (control 3). The clinical characteristics of these patients are shown in Table 4. The several variations also revealed a significant association with type 2 diabetes. Among them, the stronger association was observed at a variable number of tandem repeat (VNTR) loci (χ2=10.9, P=0.0009; odds ratio=1.57, 95% CI 1.20–2.06) and two SNPs in the first intron (χ2=11.6, P=0.0006; odds ratio=1.60, 95% CI 1.22–2.09, and χ2=12.2, P=0.0004; odds ratio=1.61, 95% CI 1.23–2.11) (Table 5). We also analyzed haplotype structure using the EM algorithm and found that 12 SNPs with the allelic frequency of >0.15 in the TFAP2B gene constituted one haplotype block, and five common haplotypes could cover more than 90% of the population (Fig. 2). Subsequent association study for each haplotype with type 2 diabetes identified a significant association of haplotype 4 with type 2 diabetes. However, the association of this haplotype was not stronger than found at the single locus. We also applied a step-wise logistic regression analysis to the SNPs in the block to get the subset of SNPs most significantly associated to the disease. The analysis was based on a full genotype model that included all effects of additive, dominance/recessive, and interaction between SNPs (see Methods). Applying both forward and backward selection strategies, we found that the original SNP itself (SNP at intron 1) was most significantly associated to the disease; other combinations of SNPs revealed less significance, and any additional effects of other SNPs to this original SNP were not significant. Therefore, the variations in intron1 (VNTR and SNPs) seemed to be able to explain most of the positive association of TFAP2B with type 2 diabetes.

Table 4 Clinical characteristics of the patients
Table 5 Association of polymorphisms in the TFAP2B gene with type 2 diabetes in the Japanese population (case 3 versus control 3)
Fig. 2
figure 2

Analysis of haplotype structure and estimated haplotype frequencies in the TFAP2B gene. Thirteen common variations constituted one haplotype block. 1 5′-flanking –512, 2 intron 1+774, 3 VNTR, *X: 10 or 8 repeat, Y: 9 repeat, 4 intron 1+1697, 5 intron 1+2093, 6 intron 1+2491, 7 intron 2+58, 8 intron 2+2093, 9 intron 3+242, 10 intron 3+514, 11 intron 3+2134, 12 intron 4+528, 13 intron 6+1710. P values for comparing haplotype frequencies between case and control groups were generated by chi-square test using 2×2 contingency table comprised of the number of one haplotype and the sum of other haplotypes

We next examined the association of this gene with type 2 diabetes in a different ethnic group. As shown in Table 6, the association of this gene with type 2 diabetes could be observed also in UK population. The results indicated that the T allele of SNP at intron 1+774 was shown as a risk allele, which was consistent with the result in the Japanese population although some difference in the allele frequency and in the pattern of LD within this region seemed to be present between these two populations.

Table 6 Association of SNPs in the TFAP2B gene with type 2 diabetes in the UK population

Reverse transcription polymerase chain reactions

To investigate the possible biological mechanism of TFAP2B involvement in this disease, we examined the expression pattern of this gene by RT-PCR using RNAs from various human tissues and found the pattern similar to that reported previously (Moser et al. 1995). However, we identified a high level of TFAP2B expression in the adipose tissue that had not been examined in the previous studies (Fig. 3a, b). Furthermore, we interestingly found that expression of m TFAP2B increased in mouse 3T3-L1 cells according to the degree of differentiation (Fig. 3c).

Fig. 3
figure 3

Expression profiles of the TFAP2B gene. a The reverse transcription polymerase chain reaction (RT-PCR) using RNAs from multiple human tissues. b Results of quantitative real-time PCR. c Expression of murine TFAP2B in 3T3-L1 cells measured by RT-PCR on the indicated days after induction of differentiation.

Discussion

In the report presented here, we performed a genome-wide, case-control association study using gene-based SNPs and identified the TFAP2B gene as a candidate gene conferring susceptibility to type 2 diabetes.

The contribution of genetic factors to pathogenesis of type 2 diabetes is well accepted, but only a few genes have been implicated in playing significant roles in susceptibility to type 2 diabetes so far (Horikawa et al. 2000; Ong et al. 1999; Altshuler et al. 2000). The difficulty of identifying alleles responsible for common diseases is explained by the fact that effects of individual genes in a complex genetic and environmental background are often too small to be identified with classical approaches. Our successful results presented here, as well as the recent publication for the susceptibility genes for myocardial infarction (Ozaki et al. 2002) and rheumatoid arthritis (Suzuki et al. 2003), provide solid evidences that a genome-wide approach using SNPs as genetic markers is a useful and powerful tool to identify genes conferring susceptibility to common diseases, such as DM.

In a large-scale, genome-wide association study like our present one, the type 1 error should be minimized. The design of this study to select a particular SNP associated with diabetes was complicated because the same controls or cases were used in the first (case 1 versus control 1), the second (case 2 versus control 1), and the third (case 2 versus control 2) tests. Hence, the ordinary methods for the statistical tests could not be applied. Since these three tests were not independent of each other, one could easily overestimate the significance. To estimate the probability of an SNP passing the three tests under the conditions used in this study, we simulated exactly the process of the three tests using the Monte-Carlo method, and the final P value of <4.67×10−7 was obtained as the probability of the particular SNP to pass the four tests (P<0.028 after Bonferroni’s correction).

To evaluate the association of TFAP2B with type 2 diabetes further, we examined the association of this gene with type 2 diabetes in a different ethnic group. The results indicated that the SNP at the first intron of TFAP2B (intron 1+774) was significantly associated with type 2 diabetes in the UK population also (GG 0.791, GT 0.186, TT 0.023 in type 2 DM, GG 0.824, TG 0.174, TT 0.002 in control, P=0.002, Table 6). Subsequent haplotype analysis revealed a significant difference in haplotype frequency between type 2 DM and controls (P=0.007, Table 6), with an increase in the TCC haplotype in the case. This result in the UK population was almost consistent with that in the Japanese population, further supporting a positive association of the TFAP2B gene with type 2 diabetes although there seemed to be some differences in the allele frequency and in the pattern of LD in this region between these two populations.

TFAP2B is a well-known transcription factor and has been reported to play an important role in embryonic development. In mice, expression of m TFAP2B decreases significantly after birth (Moser et al. 1995). Mice lacking TFAP2B die within 1 or 2 days after birth from renal failure due to polycystic kidney disease (Moser et al. 1997). In humans, mutation of TFAP2B causes Char syndrome, a condition characterized by patent ductus arteriosus and variable degrees of facial dysmorphism and hand abnormalities (Satoda et al. 2000); those features suggest that TFAP2B plays an important role in the embryonic development of various tissues. However, until now, no evidence has emerged to suggest a role of TFAP2B in the pathogenesis of type 2 diabetes. To investigate its possible roles in this disease, we examined the expression pattern of this gene by RT-PCR using RNAs from various human tissues.

Our report is the first to show that TFAP2B is expressed in differentiated adipocytes that are well known as a target of insulin and as cells associated with insulin resistance. Differentiated adipocytes can function in an endocrine manner to secrete several cytokines, called “adipokines,” which include TNF-α, IL-6, leptin, adiponectin, and others (Spiegelman and Flier 1996; Matsuzawa et al. 1999). These genes are found to contain binding sites for TFAP2 in their promoter (Kroeger and Abraham 1996; Isse et al. 1995; Takahashi et al. 2000). Given such observations, we suggest that TFAP2B plays a key role in the pathogenesis of type 2 diabetes by affecting insulin responsiveness through the transcriptional regulation of genes involved in insulin response of differentiated adipocytes.

In summary, by means of a large-scale, gene-based SNP approach, we have identified TFAP2B as a novel susceptibility gene for type 2 diabetes. These results suggest that TFAP2B itself, as well as molecules upstream or downstream of its function, might represent novel targets for treatment or prevention of this common disorder.