Introduction

Systemic lupus erythematosus (SLE) (OMIM #152700) is a complex, heterogeneous, and relatively prevalent autoimmune disease in Japan. Glucocorticoids, together with immunosuppressive agents, have been widely used for treatment of SLE patients and have remarkably improved the clinical outcome for SLE patients (Trager and Ward 2001). However, most of the patients with SLE need to take these drugs throughout their lives after the diagnosis and often suffer from adverse events, such as opportunistic infections, avascular necrosis or myocardial infarction. Even in patients for whom the disease’s activity could be successfully controlled at an early stage, progression of diseases such as renal insufficiency and neurological complications might afflict the patients later in the course. Hence, it is crucial for its detailed pathogenesis to be elucidated and novel therapeutic modalities developed that will improve patients’ prognoses.

Although the molecular mechanisms of SLE have not been fully clarified, the involvement of both environmental and genetic factors in the etiology of SLE has been implied. Concordance rate of SLE in monozygote twins (24–58%) was significantly higher than that in dizygote twins (less than 10%), and a sibling recurrence risk ratio (λs) of the disease was estimated to be 20 (Croker and Kimberly 2005). Therefore, the presence of multiple genetic factors that contribute to its pathogenesis has been suspected. To investigate genes susceptible to SLE, 11 studies applying genome-wide linkage analysis were conducted, and eight loci were shown to have significant linkage to SLE (Tsao 2003). A combined analysis of these studies indicated three loci at 6p21.1-q15, 16p13-q12.2 and 20p11-q13.13 to be candidate regions for SLE susceptibility (Forabosco et al. 2006).

In the very recent years, an information database of very high-density single nucleotide polymorphisms (SNPs) across the genome has been constructed, and advances in genotyping technology permit high-throughput SNP analysis at low cost. Hence, the SNP-based genome-wide approach is now widely used to investigate the genetic factors involved in diseases with high prevalence and low penetrance. By applying this approach, our group has identified genes susceptible to various diseases, such as myocardial infarction, rheumatoid arthritis, osteoarthritis, and cerebral infarction (Ozaki et al. 2002; Suzuki et al. 2003; Seki et al. 2005; Kubo et al. 2007).

In the study described here, we performed a genome-wide case–control association study to exploit genes susceptible to SLE and found that an SNP in the TNXB gene within the major histocompatibility complex (MHC) region of chromosome 6p21 had a strong association with SLE. Our genome-wide association study identified an SLE-susceptible locus on the MHC region in the Japanese population.

Materials and methods

DNA samples and genotyping

A total of 178 individuals with SLE was diagnosed at the Department of Nephrology, Tokyo Women’s Medical University (case 1 and case 2, Supplementary Table 1). Additional DNA samples of 203 individuals with SLE (case 3) were obtained from the Department of Rheumatology. All patients were diagnosed to have SLE on the basis of the revised criteria by the American College of Rheumatology (Hochberg 1997). Two sets of 538 (control 1) and 365 (control 2) control subjects were recruited, independently, from various medical institutes in Japan. We obtained written informed consent from each subject, and this study was approved by the ethics committee at the Institute of Medical Sciences, University of Tokyo, as well as that at Tokyo Women’s Medical University. We extracted genomic DNA from peripheral blood leukocytes, using standard protocols. The third control cohort (control 3) was a gender- and age-matched control consisting of 294 healthy volunteers collected by the Pharma SNP Consortium (Ikari et al. 2006). We genotyped SNPs using the multiplex polymerase chain reaction (PCR)-based Invader assay (Third Wave Technologies). The detailed method has been described elsewhere (Ohnishi et al. 2001). We sequenced the TNXB gene in 48 healthy Japanese control subjects to search genetic variations further. Typing for the HLA–DRB1 locus was performed by the PCR-microtiter plate hybridization (MPH) method using commercial typing kits (WAK Flow, Wakunaga, Hiroshima, Japan), as described previously (Kawai et al. 1996).

Copy number analysis of C4 gene

The C4 gene has two isoforms, C4A and C4B, differing by only four amino acids at positions 1,101, 1,102, 1,105, and 1,106 (Supplementary Fig. 1). Another two isoforms, C4 long (C4L) and C4 short (C4S), were distinguished by the presence (C4L) or absence (C4S) of the insertion of endogenous retrovirus sequence HERV-K (C4) in intron 9 (Dangel et al. 1995). The TaqMan probes to evaluate copy numbers of the C4A and C4B genes have been previously described (Szilagyi et al. 2006). We designed primers and probe for C4S by Primer Express software, version 1.5 (Applied Biosystems). Probes and primer sets used in this assay are listed in Supplementary Table 2. We used 5′ VIC-labeled RNaseP control reagents (Applied Biosystems) as a reference gene probe. We prepared 10 μl of reaction mixture containing 20 ng of genomic DNA, 900 nM of primers, 250 nM of probes, 0.5 μl of RNase P control reagent, and 5 μl of TaqMan Universal Master Mix (Applied Biosystems). Thermal cycling conditions were as follows: initial 2-min incubation at 50°C and 10-min incubation at 95°C, and following the 40-cycle reaction at 95°C for 15 s and 60°C for 1 min. The PCR reaction was performed in a 384-well reaction plate using ABI 7900HT (Applied Biosystems). A genomic fragment of the RNase P gene, together with the target fragment, was co-amplified within the same tube to apply the copy number of the RNase P gene for normalization. Experiments were at least triplicated for each sample. The real-time amplification data were analyzed by sequence detection system software (version 2.1; Applied Biosystems). Relative gene copy numbers were determined by the comparative CT method (User Bulletin #2; Applied Biosystems). Outliers within triplicates were excluded by the result of Smirnov–Grubb’s test. We verified the accuracy of this quantification method by comparing the result of quantitative polymerase chain reaction (qPCR) analysis and that of Southern blotting analysis (Supplementary Fig. 2). The total gene copy number of C4 was estimated from the sum of C4A and C4B, as well as that of C4S and C4L. Thus, we determined the copy numbers of total C4 and C4L from those of C4A, C4B, and C4S.

Statistical analysis

A genome-wide association study of SNPs in SLE patients and control subjects had been already carried out. To increase the statistical power, we adopted the two-step approach (Saito and Kamatani 2002). At the first stage, we performed an association study of 94 SLE patients (case 1) and 538 control subjects (control 1) for the genome-wide approach using 52,608 gene-based SNPs (Tsunoda et al. 2004). Then, we chose SNPs that showed a P value of 0.01 or less in either of three genetic models and further performed SNP genotyping in 84 SLE patients (case 2) and 365 independent control subjects (control 2). In each stage, SNPs showing a P value of 0.01 or lower in the Hardy–Weinberg equilibrium test for control samples were excluded. Fisher’s exact test was applied for the calculation of statistical significance of two-by-two contingency tables in three genetic models: an allele count model, a dominant effect model, and a recessive effect model. We used Haploview software, version 3.32 (Barrett et al. 2004) to draw a linkage disequilibrium (LD) map around the marker SNP. In the replication study, the independent 203 patients (case 3) and 294 controls (control 3) were genotyped for SNP rs3130342, and the analyses were carried out by the same statistical method. The DNA from a total of 178 patients (cases 1 and 2) and 365 controls (control 2) was also analyzed for variations in gene copy number. To investigate the differences in the number of copies between the case and control groups, we used the Wilcoxon rank sum test. Differences in the distribution of copy number among the three genotype groups were analyzed by the Kruskal–Wallis test. Multivariate logistic regression modeling was used to estimate the independency of each polymorphism for SLE susceptibility, concerning SNP rs3130342 and the number of copies of the C4A gene and the C4B gene. Fisher’s exact test was also used for the test of association of HLA-DRB1*1501 with SLE by allele count model. For statistical analysis, we used the R statistical software version 2.4.1.

Results

To screen for a gene(s) susceptible to SLE, we performed genome-wide, case–control, association analysis using 52,608 gene-based SNPs for 94 SLE patients (case 1) and 538 controls (control 1). Among the 50,464 SNPs successfully genotyped, 1,310 showed P values of 0.01 or smaller. Subsequently, we genotyped for the additional 84 SLE patients (case 2) and 365 controls (control 2) for these 1,310 SNPs, and found an SNP, rs1009382, to have a strong association with SLE, with a P value of 0.00894, in the second stage; the combined samples at allele count model revealed a P value of 0.00000518 (Supplementary Table 3). This P value is the second significant P value obtained by this approach following an SNP in the ITPR3 gene recently identified (Ohishi et al., data not shown).

Next, we constructed an LD map around this marker SNP on the basis of the HapMap-JPT genotyping data [release 21, based on National Center for Biotechnology Information (NCBI) build 35 and single nucleotide polymorphism database (dbSNP) build 125] using SNPs with the minor allele frequency of 20% or higher. As shown in Fig. 1b, two LD blocks were identified in the genomic region, including the SNP rs1009382. We selected 45 tag-SNPs from the genomic region covering these LD blocks and genotyped them for 178 patients (cases 1 and 2) and 365 controls (control 2). The statistical analysis indicated that the association was still peaked at this marker SNP rs1009382 in exon 23 of the TNXB gene and became gradually weak in both directions (Fig. 1a), implying that TNXB was the most likely candidate gene susceptible to SLE.

Fig. 1
figure 1

Association of SNPs in the TNXB gene with SLE. a Single-point associations between SLE and tag SNPs around the C4 CNV region. The Y axis is in units of −log10(P). The lowest P value was detected at an SNP rs1009382 locus. b An LD map of this region constructed from the HapMap-JPT data. The r 2 value between a pair of SNPs (minor allele frequency was 0.2 or more) was plotted at the apex of an isosceles right triangle defined by the baseline connecting the SNPs. Two LD blocks are shown by bold edged triangle

Fig. 2
figure 2

Distributions of copy numbers of C4 and its isoforms in the three genotype groups for an SNP rs3130342. The numbers of subjects were as follows: GG (n = 244), GT (n = 109), and TT (n = 10) at SNP rs3130342. a For C4A, two copies were prevalent in the GG genotype group, two and three copies were almost equal in the GT genotype group, and three copies were prevalent in the TT genotype group. b For C4B, more than 80% of the subjects with the GG genotype had two copies, and those with the GT genotype predominantly had one copy. The loss of the C4B gene was specifically observed in the group with the TT genotype. c Total number of copies of the C4 gene. Five or more copies were observed more frequently in the group with the GG genotype, and three copies were predominantly detected in the group with the TT genotype. d For C4L, none of the individuals with the TT genotype was judged to have two copies. e For C4S, the most significant difference was observed. All individuals belonging to the group with the TT genotype were judged to have no copy of C4S, while the remaining two-genotype groups predominantly possessed one or more copies. The GG genotype at SNP rs3130342 is a risk genotype for SLE

To define further the region of interest, we screened SNPs in a genomic region corresponding to the TNXB gene and identified 60 SNPs. Since the CNV region was located adjacent to this locus and contained a part of the TNXB gene, we selected and genotyped 33 SNPs mapped to a single-copy part of the human genome and found that the SNP rs3130342 located in the 5′ flanking region of the TNXB gene revealed a stronger association [P value of 0.000000930, with odds ratio (OR) of 3.11 and 95% confidence interval (95%CI) of 1.89–5.28] than did the marker SNP (Table 1). Although the ITPR3 gene, for which we had previously revealed a significant association with SLE, is located in the same chromosomal region as the TNXB gene is, the distance between the two genes is 1.5 Mb, and the pairwise D′ value between them is very small. Hence, the association of the SNP on the TNXB gene with SLE was considered to be independent of that on the ITPR3 gene.

Table 1 Summary of association of SNPs on TNXB gene with SLE

Systemic lupus erythematosus is known to occur predominantly in female patients; the female/male ratio in our study population was approximately 9:1. Since the control 2 population was not gender-matched, we performed a subgroup analysis using female patients and controls. The association study of SNP rs3130342 in 150 female patients and 67 female control subjects revealed a P value of 0.00395, in a recessive model, by Fisher’s exact test (Table 2). The G-allele frequency and the odds ratio were almost the same as when we used all case and control subjects. To validate further the association of rs3130342 with SLE susceptibility, we performed a replication study using independent cases (case 3, n = 203) and gender-matched controls (control 3, n = 294). As a result, the association of rs3130342 was replicated in a recessive-effect model (= 0.044 at alpha = 0.05, Table 2).

Table 2 Association of SNP rs3130342 with SLE

Among the many genes that were included in the human MHC region on chromosome 6p21, particular HLA alleles, as well as deletion of the C4 gene, were suggested to be genetic factors susceptible to SLE. The SNP rs3130342 is located close to the C4 gene, and a recent study of a Caucasian SLE population indicated the association of a lower C4 gene copy number with SLE susceptibility (Yang et al. 2007). Therefore, it is possible that the association between the SNP rs3130342 and SLE might simply reflect the linkage disequilibrium between C4 CNV and SNP on TNXB. Hence, we investigated the number of copies of the C4 gene and each of its isoforms (C4A, C4B, C4L, and C4S), in the DNA of 178 SLE patients (cases 1 and 2) and 365 control subjects (control 2), by qPCR.

On the basis of the results shown in Table 3, we tested the null hypothesis that the distribution of the copy number in the SLE patients was the same as in the healthy controls, using Wilcoxon’s rank sum test. The null hypothesis was rejected at significance level alpha = 0.01 (determined by Bonferroni’s correction for five tests) in the analysis of copy numbers of total C4 and C4B (P values of 0.0003453 and 0.0008674, respectively). Copy numbers of total C4 and C4B were significantly higher in SLE patients (4.20 and 1.88 in cases, and 3.96 and 1.74 in controls, respectively). Similar results were also obtained when we used the female subgroup for the analysis (Supplementary Table 4). Thus, our analyses indicated that greater numbers of copies of the C4 and C4B gene might possibly be risk factors for SLE, although these results were contradictory to the results reported by Yang et al. (2007), in which lower numbers of copies of the C4 gene were shown to be a susceptible factor to SLE.

Table 3 Association of gene copy number of C4 and its isoforms with SLE

Since the association of both C4 CNV and SNP rs3130342 with SLE might simply reflect linkage disequilibrium between these two loci, we analyzed the relation between the number of copies of C4 and its isoforms, and the genotypes of this particular SNP (Fig. 2). A Kruskal–Wallis test of the results revealed that copy numbers of C4A (P = 4.213 × 10−7, alpha = 0.01 for all these five tests), C4B (P = 1.195 × 10−31), and C4S (P = 7.32 × 10−10) were significantly different among the three genotypes of the SNP, although the total C4 copy number was of borderline significance (P = 0.03734). The number of copies of C4A became smaller according to the increase in the number of the G alleles (risk allele), while the numbers of copies of C4B, total C4, and C4S increased when the number of G alleles was higher. These data suggested that SNP rs3130342 is in linkage disequilibrium with the CNV, especially with the copy number of C4B. The same tendency was also observed when we examined case 1 and case 2 individuals (data not shown).

We performed multivariate logistic regression analysis to evaluate the independence of these associated variants (Table 4). The SNP rs3130342 continued to show an association with SLE, regardless of the C4 gene copy number, and revealed the smallest P value and the highest OR (P = 0.0000954, OR 3.24, 95%CI 1.80–5.87), indicating that the TNXB gene, but not the C4 CNV, is the possible susceptible locus to SLE.

Table 4 Results of multivariate logistic regression analysis

Particular HLA alleles were suggested to be genetic factors susceptible to SLE in the MHC region. Among them, HLA-DRB1*1501 has repeatedly been reported to be associated with SLE in the Japanese population (Hashimoto et al. 1994; Ohashi et al. 2001). To exclude the possibility that the association of SNP rs3130342 with SLE was due to its linkage disequilibrium with this HLA locus, we genotyped HLA-DRB1 using the 178 patients with SLE and the 365 control subjects (cases 1 and  2 and control  2). Our data indicated an association of HLA-DRB1*1501 with SLE (allele frequency was 15.4% and 7.0% in case and control samples, respectively, P = 0.0000245), similar to the previous reports. Stratified analysis revealed strong association between SNP rs3130342 and SLE in subjects without the HLA-DRB1*1501 allele (P = 0.0000727 at recessive model, Supplementary Table 5). Thus, we concluded that the GG genotype of SNP rs3130342 was associated with SLE, independent of the HLA-DRB1*1501 allele.

Discussion

In this study, we found a significant association of the SNP rs3130342 in the TNXB gene with SLE susceptibility in the Japanese population. The TNXB gene is reported to encode a member of the tenascin family of extracellular matrix proteins that was highly expressed in skin, muscle, tendon sheath, peripheral nerve, and vasculature (Matsumoto et al. 1994; Geffrotin et al. 1995). Molecular pathogenesis of SLE was considered as the exaggerated B cell response caused by activated T cells or dendritic cells, or soluble mediators such as cytokines (Kyttaris et al. 2005). Tissue deposition of antibodies or immune complex induces inflammation and subsequent injury of multiple organs. Mutations in the TNXB gene were found in patients with Ehlers–Danlos-like syndrome (Burch et al. 1997), but the role of TNXB in the immune system was not known.

C4 has an essential role in the classical complement pathway. C4A deficiency was repeatedly reported to be associated with SLE in Caucasian populations. However, the association of C4 CNV with SLE was not examined until quite recently. In 2002, the genomic structure of the C4 CNV region was precisely determined (Chung et al. 2002), and the association of the lower copy number of the C4 or C4A gene with SLE in the Caucasian population was reported recently (Yang et al. 2007). However, our result that a higher number of copies of the C4 gene was associated with the susceptibility to SLE was controversial with regard to their results. In earlier days, the frequency of an 8.5 kb-HindIII band (corresponding to C4A gene deletion) was reported to be higher in SLE patients (20.5–24.5%) than in control individuals (7.9–12.9%) in the Caucasian population (Kemp et al. 1987; Goldstein et al. 1988; Reveille et al. 1991; Hartung et al. 1992), while in an Afro-American population, its frequencies were 14.5% in SLE patients and 3.7% in controls (Olsen et al. 1989). On the other hand, this 8.5 kb-HindIII band itself was not observed in Japanese, Korean, or Chinese populations (Yamada et al. 1990; Doherty et al. 1992; Hong et al. 1994), indicating that this particular allele was extremely rare or absent in Asian populations. Two basepair insertion mutations that would cause a premature stop codon in exon 30 of the C4A gene were reported in Caucasian SLE patients (Barba et al. 1993; Lokki et al. 1999), but we found such mutation in none of our SLE patients (data not shown).

Considering the ethnic differences indicated above, the discrepancy between our result (association of a higher copy number of C4 to SLE susceptibility) and the result by Yang et al. (association of a lower copy number of C4 to SLE susceptibility) could be explained by historical recombination, where the true susceptible risk allele is linked to the high copy number of C4 in Japanese populations and with the low copy number in Caucasians. In general, the existence of linkage disequilibrium between SNP locus and CNV locus is a matter of argument (Locke et al. 2006; Redon et al. 2006), but, especially in this region, we clearly indicated that there was linkage disequilibrium between C4 CNV locus and the SNP on TNXB. On multivariate logistic regression analysis, the SNP rs3130342 remained associated with SLE, independent of C4 CNV, and this result suggests that the association between C4 CNV and SLE might only reflect the linkage disequilibrium between C4 CNV and SNP on TNXB.

We also examined the effect of HLA locus on SNP rs3130342. Since HLA-DRB1*1501 was shown to associate with SLE in Japanese populations, we examined the HLA-DRB1 gene and revealed that SNP rs3130342 was still associated with SLE in subjects without the HLA-DRB1*1501 allele. We also found the tendency of association (P = 0.092, odds ratio 2.94, 95%CI 0.76–13.6) in individual with the HLA-DRB1*1501 allele, although the association was not statistically significant, possibly due to smaller sample size (n = 52 and 51 in case and control samples). HLA-B*0801-HLA-DRB1*0301 haplotype was also frequently reported to be associated with SLE in Caucasians. This haplotype was included in a large haplotype called 8.1 ancestral haplotype or “COX haplotype” that consists of HLA-A1 Cw7 B8 C4AQ0 C4B1 DRB1*0301 DRB3*0101 DQB1*0201 DQA1*0501 (Stewart et al. 2004). Moreover, the risk allele (G allele) of SNP rs3130342 was included in the COX haplotypes in Caucasian populations according to the Sanger MHC haplotype project. However, the HLA-DRB1*0301 allele was almost absent or extremely rare in the Japanese population (Hashimoto et al. 1994) and none carried the HLA-DRB1*0301 allele in our case and control individuals. Taken together, the association of rs3130342 with SLE was not due to its linkage disequilibrium with “COX haplotype” or HLA-DRB1*1501 in Japanese populations, and rs3130342 might have a causative role in SLE pathogenesis.

In summary, we performed a genome-wide association study of SLE and identified the SNP rs3130342 in the 5′ flanking region of the TNXB gene as a possible candidate gene susceptible to the disease. The TNXB gene encodes an extracellular matrix protein, tenascin XB, that regulates collagen synthesis and deposition (Mao et al. 2002). Our findings indicated the possible role of collagen metabolism in the pathogenesis of SLE, and the association of C4 CNV might not have a causal effect in the Japanese population. However, to clarify the physiological role of the TNXB polymorphism in SLE susceptibility, further investigation are required.

Web resources

Pharma SNP Consortium, http://www.jpma.or.jp/psc/index.html. The R statistical software: http://www.R-project.org. The HapMap project: http://www.hapmap.org. The JSNP database: http://snp.ims.u-tokyo.ac.jp/index.html. NCBI BLAST: http://www.ncbi.nlm.nih.gov/BLAST. The Sanger MHC haplotype project: http://www.sanger.ac.uk/HGP/Chr6/MHC/.