Introduction

Prostate cancer is the most frequently observed malignancy among elderly men, especially in advanced nations. In order to better understand the etiology of this prevalent disease, epidemiological approaches have established certain factors for susceptibility that include age, ethnicity, country of origin, and family history. A hereditary component is indicated by considerable evidence (Lichtenstein et al. 2000; Ostrander and Stanford 2000). Five chromosomal regions were defined by familial mapping studies (Smith et al. 1996; Berthon et al. 1998; Xu et al. 1998; Gibbs et al. 1999), from which the ribonuclease L gene (RNASEL) (Carpten et al. 2002) and the elaC homologue 2 gene (ELAC2) (Tavtigian et al. 2001) were later identified as the genetic elements conferring susceptibility at hereditary prostate cancer loci 1 (HPC1) and 2 (HPC2). However, since aberrant alleles of those two genes may account only for 5–10% of prostate cancers (Carter et al. 1992), additional, low-penetrance susceptibility genes must be sought.

Case-control studies of sporadic cases have examined common variations in a number of candidate genes; examples are the androgen receptor (AR) gene and the kallikrein 3 gene (also known as prostate specific antigen). Several independent studies have shown that variations in ELAC2, AR, steroid-5-alpha-reductase, alpha polypeptide 2 (SRD5A2), and a cytochrome P450 gene (CYP17) show relatively good reproducibility in terms of conferring susceptibility to prostate cancer, although the significance of the results has not been fully validated. Additional candidates should be examined because having a reasonable number of reliable susceptibility genes may enable us to evaluate the combined contribution of multiple risk factors for this disease.

Among the important candidates are matrix proteases and their inhibitors, which are responsible for degradation of the extracellular matrix and thus would be related to tumor invasion. Serine proteinase inhibitors (SERPINs), a large superfamily of proteinase inhibitors (family A-I), are involved in multiple fundamental processes such as angiogenesis, inflammation, activation of complement, and fibrinolysis. Among the nine serpin families, many members of class B (SERPINBs; also called ovalbumine-like serpins) are often implicated in cancer etiology (Silverman et al. 2001). Genes encoding SERPINBs are clustered in two chromosomal regions, 18q21 and 6q25. Moreover, all of the SERPINB genes that appear to be involved in cancer etiology, i.e., plasminogen activator inhibitor-2 (PAI-2), squamous cell carcinoma antigens 1 and 2 (SCCA1 and SCCA2), and maspin (SERPINB5), localize together at 18q21. Therefore, intense examination of genetic variations in that region should be fruitful.

In the work reported here, we investigated nucleotide variations in SERPINB genes at chromosome 18q21. By focusing on several single-nucleotide polymorphisms (SNPs) with non-synonymous coding sequences (missense cSNPs) present in SERPINB10 and PAI2, we analyzed potential associations between SNP genotypes and the risk of prostate cancer among 676 Japanese men. We also examined linkage disequilibrium (LD) in the region encompassing those variations.

Subjects and methods

Subjects

Prostate-cancer patients over the age of 45 (n=292) were recruited over the period 1995–2001 from seven university hospitals in Japan: Nippon Medical School Hospital (Tokyo), Yokohama City University Hospital, Akita University Hospital, Kyoto University Hospital, Kochi Medical School Hospital, Nagoya City University Hospital, and Chiba University Hospital. Diagnoses were ascertained by histo-pathological validation of material from trans-rectal needle biopsies or resected tumor tissues. The age at diagnosis, available for 249 of the patients, was 66.1±6.76 [0.43] (mean±SD [SE]), ranging from 45–90 years; tumor-grading scores (Gleason’s scores) for 208 subjects ranged from 1–9 (mean±SD=6.4±1.9). Clinical staging of tumors from 244 patients according to the Jewett Staging System indicated that 7 were Stage A, 99 were Stage B, 73 were Stage C, and 65 were Stage D. Healthy control subjects (n=384) were recruited from health check programs held in three different areas in eastern Japan, basically matching the sampling areas. All participants gave written, informed consent prior to the study, which was approved by the Institutional Review Boards of the Research Consortium. We regarded these subjects as representative of the general population and thus the predictive prevalence of prostate-cancer risks at age 60–70 years was expected to be at a negligible level (about 0.1%). Therefore, although the mean age of the control subjects (58.4±8.58 [0.44]; range 32–69 years) was significantly different from that of prostate-cancer patients (P<0.0001, Mann-Whitney test), contamination by individuals who have prostate-cancer risk at age 66 would be negligible. The possibility for an area-dependent difference in genotype frequency was tested by genotyping these subjects for 100 different, randomly selected SNPs. No significant difference was detected between any combinations of SNPs and areas when Bonferroni’s approximation was considered; among 100 SNPs, genotype frequency of five to seven SNPs distributed differentially per area at the level of P<0.05, but not at the adjusted level of P<0.0005 (P>0.003).

Selection of SNPs and predictive analysis of amino acid changes

Within a cluster of eight SERPINB genes at 18q21, 25 missense nucleotide variations have been archived in the dbSNP database (Table 1). To select the most likely candidates for risk of prostate cancer from among these variations, we evaluated the likely effect of each amino acid alteration, using the “Sorting Intolerant From Tolerant” computer program (SIFT: http://blocks.fhcrc.org/sift/SIFT.html; Ng and Henikoff 2001). Amino acid sequences encoded by these eight SERPINB genes were obtained from the RefSeq database of NCBI, and each of the entire sequences in FASTA format was imputed to run the SIFT program under its default settings. The SIFT algorithm estimates differences between original and altered sequences, based on the assumption that amino acid positions important for the correct biological function of a protein are conserved across the protein family and/or across evolutionary history. As output, a calculated score represents estimated tolerability ranging from 0.00–1.00; when the calculated score is less than 0.05, the impact of the amino acid change is considered deleterious. On the basis of this information, we decided to focus on one nonsense SNP and seven missense cSNPs whose given scores were less than 0.2 (Table 1), in view of the specificity and sensitivity of the scoring system. We designated the selected cSNPs by combining their gene symbols and amino acid changes with an underline; for example, a nucleotide substitution at position 358 in the coding sequence of the PAI2 gene (c.358A>G; no. 17 in Table 1) was designated PAI2_N120D (Tables 2 and 3).

Table 1 Summary of SIFT analysis of missense SNPs localized in SERPINB loci
Table 2 Summary of contingency-table analysis of the missense cSNPs
Table 3 LD analysis of analyzed polymorphism

For analysis of LD, we selected 11 SNPs from within the PAI2 and SERPINB10 genes whose heterozygosity was reported to exceed 10% in the database. Nucleotide variation, location, and referenced database ID for each SNP are summarized in Table 3.

Genotyping methods

Each subject was genotyped either by Invader assay (Ohnishi et al. 2001), the SD-PCR method (Iwasaki et al. 2003), or a TaqMan assay (Livak 1999). For PAI2_S413C, all subjects were genotyped by SD-PCR according to a protocol described previously (Iwasaki et al. 2003). For the other six of the seven missense cSNPs selected from the PAI2, SERPINB11, SERPINB8, and SERPINB5 genes, the Invader assay was applied according to a published protocol (Haga et al. 2002; Iida et al. 2002), using reagents and probes provided by the supplier (Third Wave Technologies, Madison, WI, USA). Genomic DNA flanking each SNP was amplified by PCR before the assay. For LD analysis of the selected SNPs we performed TaqMan Assays-on-Demand (Applied Biosystems, Foster City, CA) according to the manufacturer’s protocol. Reagents, probes, and primers were provided by the manufacturer.

Haplotype construction and LD analysis

To analyze LD in the PAI2 and SERPINB10 loci, we constructed haplotypes using 11 SNPs and estimated their frequencies among healthy subjects (n=384) by means of an EM algorithm (SNPAlyze v3.1; DYNA-COM, Chiba, Japan). Indices of LD, i.e., D, D′ and r2 (Miller et al. 2000) were analyzed for all possible two-way combinations of the 11 SNPs. To evaluate a broader range of LD, we used archived genotyping data on 48 Japanese for 27 SNPs (from rs6094 to rs963075) in surrounding genes; data were obtained from a database provided through the “SNP browser” (Applied Biosystems). Haplotype frequencies, D′ and r2 were calculated as described above. To analyze the association between specific haplotypes involving the PAI2 and SERPINB10 loci and prostate-cancer risk, we constructed haplotypes using only five cSNPs and estimated their frequencies separately in the patient and control groups.

Statistical analyses

We analyzed the distribution of genotype frequencies between distinctive study groups (prostate-cancer patients and controls), using 2×3 tables and chi-square tests to reveal trends. When dominant or recessive effects were assumed, 2×2 tables were analyzed by chi-square tests. Statistical significance was set to less than 5% (P<0.05). Hardy-Weinberg equilibrium was evaluated by chi-square tests (with two degrees of freedom) in each group. The odds ratio (OR) and 95% confidence interval with respect to N120D genotypes, for example, were calculated based on the hypothesis that possession of the major A-allele (120-N) is an inherent genetic risk for prostate cancer.

Results

Selection of missense SNPs using a predictive analysis program, SIFT

To examine missense variations in the SERPINB-family genes clustered at 18q21, we extracted 25 SNPs from the dbSNP database and estimated the impact of amino acid changes using the SIFT algorithm (Table 1). The SIFT scores ranged from 0.02–0.96, the lowest scores being given to PAI2_S413C (score=0.02) and SERPINB8_K158N (0.02). In addition to those SNPs, SERPINB11_W188R (0.03) was considered likely to be deleterious (Table 1). Relatively low SIFT scores were given to SERPINB10_R246C (0.06) and PAI2_N120D (0.16); although those would normally be regarded as “tolerant” (scores>0.05), we decided to include these two borderline SNPs to avoid overlooking potentially causative variations whose effects might have been underestimated by the computer program. One nonsense cSNP and seven missense cSNPs were selected in this way, to be tested for association with prostate-cancer risk in our test population (Table 2).

Association of SERPINB SNP genotypes with risk of prostate cancer

To analyze potential associations between missense SNP genotypes and prostate-cancer risk, we determined the frequencies of individual genotypes among our cancer and control subjects (Table 2). Among the eight cSNPs selected for analysis, all except one (SERPINB5_K158N) were polymorphic in our test subjects, and the distribution was in Hardy-Weinberg equilibrium (P>0.05) in both the prostate-cancer patient group and in the control group. In a comparison of distributions of genotype frequencies between the two groups, the most significant association was detected for PAI2_N120D; frequencies among individuals homozygous for the 120-N allele (N/N), heterozygous 120-N/120-D (N/D), and homozygous for the 120-D allele (D/D) were 50, 42, and 8% respectively in cancer patients vs. 37, 46, and 17% in healthy controls. A significant trend toward possession of the major allele in the prostate-cancer group was indicated by chi-square analysis using 2×3 tables (P=5.0×10−5). A similar association was detected for three other SNPs from PAI2 and SERPINB10 (Table 2). Major alleles seemed to have a co-dominant effect, because their frequencies were shifted positively in the prostate-cancer group. Significant differences were detected when subjects were divided according to the presence or the absence of the minor allele (PAI2_N120D; P=1.4×10−3, OR=1.68 by chi-square test), or by the presence or absence of the major N-allele (P=8.0×10−4, OR=2.43 by chi-square test; Table 4).

Table 4 Contingency-table analysis of the PAI2_N120D variant

Haplotype construction and analysis of linkage disequilibrium

Since we detected multiple associated SNPs within two neighboring loci, PAI2 and SERPINB10, we first analyzed LD after estimating frequencies of haplotypes constructed on the basis of genotyping data for 27 SNPs selected from “SNP Browser” (Applied Biosystems) among 48 healthy Japanese individuals. As expected, significant LD was verified over the entire PAI2SERPINB10 locus, indicated by high D′ and r2 scores within the chromosomal region of about 50 kb between markers rs6094 and rs963075 (data not shown). The nucleotide variations in the other genes in the cluster were independent of the LD block; no individual cSNP examined in our study subjects showed significant LD with any of the variations in the PAI2 or SERPINB10 loci (D′<0.12, r2<0.004).

To test if a specific haplotype of markers in this chromosomal region would present a significant association with prostate-cancer risk, haplotype and LD analyses were conducted on all 384 control subjects. By genotyping 11 SNPs within the extended locus, significant LD was verified over the entire PAI2 and SERPINB10 genes (Fig. 1). In an effort to resolve a specific variation contributing to a haplotype for cancer risk, we estimated maximum-likelihood frequencies of haplotypes constructed using only five cSNPs. However, because LD was extremely strong between each pair of the missense SNPs and only two major haplotypes accounted for 96–99% of the chromosomes (Table 5), no specific variation responsible for the cancer-risk haplotype could be determined.

Fig. 1
figure 1

a, b Analysis of haplotypes and linkage disequilibrium (LD) for 11 variations within the PAI2 and SERPINB10 genes. a Schematic diagram of the genomic structure containing PAI2 and SERPINB10. Vertical and horizontal lines indicate exons and introns, respectively. Locations of the 11 tested variations are indicated by upward arrows. Each variation is designated by a letter (AK); A PAI2_IVS2+36T>G, B PAI2_N120D, C PAI2_IVS4−182A>T, D PAI2_IVS5+130T>C, E PAI2_IVS6+267A>C, F PAI2_N404K, G PAI2_S413C, H SERPINB10_−7513T>C, I SERPINB10_I41M, J SERPINB10_IVS5+113C>T, and K SERPINB10_R246C. b Indices of LD (D′ and r2) calculated for every possible pair among the 11 variations, shown in tabular form

Table 5 Haplotype analysis of the LD block in PAI2 and SERPINB10 genes

Discussion

For undertaking the study reported here, we hypothesized that variations in SERPINB genes might give rise to individual differences in susceptibility for promotion and/or progression of prostate tumors, because four SERPINB genes (PAI2, MASPIN, SCCA1, SCCA2) had been implicated in the etiology of various cancers. Using the SIFT program to evaluate potential functional consequences of various SNPs, we focused on eight missense cSNPs and found that four of them, all within the PAI2 and SERPINB10 genes, were associated with cancer risk among 676 Japanese subjects. To our knowledge, this is the first report describing susceptibility to prostate cancer conferred by SERPINB-family genes.

The SIFT program estimated potential changes in protein function that might be brought about by each variation we examined, and expressed the predictions in scores ranging from 0–1. A significantly low score (<0.05) was given to only two SNPs, PAI2_S413C and SERPINB8_K158N, although the latter was not polymorphic in our test population. Nevertheless, we considered that the relatively low scores given to SERPINB10_R246C (score=0.06) and PAI2_N120D (score=0.16) were comparable to those reported elsewhere for variations in important susceptibility genes such as ELAC2 (for S217L, score=0.56; for A540T, score=0.17) and SRD5A2 (for A49T, score=0.18; for V89L, score=0.12) (Tavtigian et al. 2001; Hsing et al. 2001). Since our aim was to identify susceptibility variations whose effects on cancer risk might be only slight, we assumed that the recommended significance level of the SIFT program might be too conservative for our purposes because it had been designed with a stringency possibly required for identifying affective mutations in monogenic diseases. Moreover, because SIFT scores are, after all, products of mathematical prediction, we extended the possible range for susceptibility variations to scores of less than 0.2. As a result, we detected multiple cSNPs, including PAI2_N120D and SERPINB10_R246C, that showed significant association with prostate cancer in spite of having only relatively low SIFT scores.

Among them, the most significant association was detected for PAI2_N120D (P=5.0×10−5), suggesting that the minor D-allele of PAI2_N120D might suppress emergence or progression of prostate cancer. However, because analysis of haplotypes within the PAI2 and SERPINB10 loci indicated that any of five missense cSNPs could be responsible for the cancer risk, several possibilities must be considered. Firstly, the PAI2_N120D polymorphism alone might have a direct effect on PAI2 function and the other missense SNPs have none. More likely, however, is that several of the missense SNPs we examined would have synergistic or additive effects on the function of both proteins; moreover, a contribution by other, as yet unidentified, functional polymorphisms in LD with the tested SNPs cannot be ruled out at present. To our knowledge, no previous report has identified an effect of SERPINB10 on occurrence of cancer. Thus, the functional significance of variations should be evaluated for both PAI2 and SERPINB10, to clarify what the associations we observed actually mean.

Biochemical analysis should clarify whether the polymorphisms examined here do in fact affect protein function. In addition, the mechanism(s) that suppress prostate tumors could be clarified by means of biological assays. In vitro examination of potential cellular invasion and proliferation could be appropriately designed to exploit a known function of PAI2, inhibition of the urokinase-type plasminogen activator (uPA; Kruithof et al. 1995), and to examine additional specific aspects of that function including inhibition of apoptosis in cultured cell lines (Kumar and Baglioni 1991).

The etiology of cancer, as in many other polygenic diseases, involves participation and interaction of multiple environmental and genetic factors. However, in prostate cancers, the contribution of genetic factors is thought to be relatively high, and thus linkage analyses in familial cases have identified several reliable susceptibility genes including RNASEL and ELAC2 (Tavtigian et al. 2001; Carpten et al. 2002). Also, a lesser number of CAG repeats in the polymorphic microsatellite within the AR gene reproducibly associate with sporadic prostate cancer (Irvine et al. 1995). In our study, we detected a strong association of missense variations in the PAI2 and SERPINB10 genes, although reproducibility could be a problem in view of the relatively small size of our test population. Prospective analyses of large cohorts are in progress as part of a clinical study being undertaken in a consortium of university hospitals in Japan, to investigate the real meaning of the correlations we have reported here. It would also be of value to investigate further the possibility of interactive effects of these polymorphisms. For example, the suggested protective effects conferred by minor variants of PAI2 SNPs should be examined in combination with the rare variations in RNASEL or ELAC2, or with the CAG-repeat polymorphism of the AR gene. Molecular genetic and epidemiological investigations that integrate known risk factors including age, androgen status, and possible environmental exposure to carcinogens would help to clarify the complex etiological basis of prostate cancer.

In summary, we have detected an association of multiple missense cSNPs in the PAI2 and SERPINB10 genes with prostate-cancer risk, by genotyping 292 Japanese cancer patients and 384 healthy controls. Although the functional significance of these variations is still obscure, and more confirmative replication studies are necessary, any novel candidates added to the catalogue of susceptibility-associated genetic variations will increase our understanding of prostate cancer. A robust genome-wide strategy, recently undertaken by several research institutes (Wang et al. 2005), is expected to identify many reliable candidates for multi-factorial common diseases, in spite of technical limitations on truly genome-wide approaches. Alternatively, large-scale association analyses focusing on intolerant missense variations could be a fruitful approach. When multiple susceptibility genes for prostate cancer have been identified with certainty, epidemiological investigations that incorporate multiple risk factors should lead to identification of prognosticators, preventive approaches, and novel therapies for prostate cancer.