Introduction

Translation is one of the major mechanisms of gene expression regulation. It consists of three major steps: initiation, elongation and termination. Among these, initiation is the rate-limiting step and involves at least 11 translation initiation factors (eIFs)1. EIF3 is the largest and most complex, consisting of at least 13 subunits named eIF3a to eIF3m2,3. EIF3a is the largest subunit with a molecular weight of 170 kDa, and the gene encoding it has 24 exons spanning approximately 45.8 kb of chromosome 10q264.

EIF3a plays critical roles in all steps of cap-dependent translation initiation, including binding with the 40S ribosomal subunit, scanning and recognizing the start codon2. In addition, it has been reported to be involved in internal ribosomal entry site (IRES)-mediated cap-independent translation initiation5,6. However, recent studies have shown that eIF3a is overexpressed in many cancers, including breast, lung, cervix, stomach and esophagus2. Our previous study also showed that eIF3a regulates drug response by altering the expression of some DNA repair proteins7. These studies thus implicate eIF3a as a potential oncogene.

Single nucleotide polymorphisms (SNPs) are the most abundant human genome variations. Our previous study demonstrated that two eIF3a SNPs were correlated with drug response and toxicity8,9. We thus hypothesized that additional eIF3a SNPs may correlate with response and toxicity of platinum-based chemotherapy in non-small cell lung cancer (NSCLC) patients. It has been reported that screening SNPs with evidence of positive selection is a valid strategy for identifying functional SNPs10,11,12. This method is based on the theory that phenotypic differences between individuals are the functional consequences of genetic variants that facilitated the survival of ancestral human populations under different environmental stressors13. Therefore, SNPs under positive selection are likely to be functionally important for their linked genes. This strategy has been used successfully to identify functional SNPs associated with many genes, including glucose-6-phosphate dehydrogenase (G6PD)13, multidrug resistance 1 (MDR1)10, cytochrome P450, family 3, subfamily A (CYP3A)11 and proprotein convertase subtilisin-like kexin type 9 (PCSK9)12.

In the present study, we employed the long-range haplotype (LRH) test to identify positively selected SNPs of eIF3a and explored whether they were correlated with response and toxicity of platinum-based chemotherapy in Chinese NSCLC patients.

Materials and methods

Study population, acquisition of clinical information and genotyping

The study protocol was approved by the Ethics Committee of Xiangya School of Medicine, Central South University. A total of 325 lung cancer patients were enrolled in this study. All individuals provided written informed consent in compliance with the code of ethics of the World Medical Association (Declaration of Helsinki) before this study was initiated. Eligible subjects were treated at the Affiliated Cancer Hospital or Xiangya Hospital of Central South University (Changsha, Hunan, China) between November 2011 and May 2013. For this study, patients had to meet the following criteria: (1) histologic or cytologic confirmation of lung cancer and (2) presence of a primary tumor. Exclusion criteria included (1) pregnancy or lactation, (2) active infection, (3) symptomatic brain or leptomeningeal metastases and (4) previous or concomitant other malignancies. All demographic and clinical information was obtained from the two hospitals. The clinical characteristics of all subjects are summarized in Table 1. We applied this study for clinical admission in the Chinese Clinical Trial Register (registration number: ChiCTR-RCC-12002830).

Table 1 Baseline characteristic of NSCLC patients treated with platinum-based chemotherapy.

The chemotherapy response was evaluated by the response evaluation criteria in solid tumor (RECIST) guidelines. Complete response (CR) and partial response (PR) were defined as drug response, whereas stable disease (SD) and progressive disease (PD) were defined as drug resistance. In the current study, hematologic toxicity (anemia, neutropenia and thrombocytopenia) and gastrointestinal toxicity (emesis and hepatotoxicity) were evaluated. Toxicity was classified according to the National Cancer Institute Common Toxicity Criteria 3.0 (NCI-CTC 3.0) at each cycle of chemotherapy. For each patient, the most severe grade during the entire study interval was evaluated. In brief, for all side effects, grade 0 and grade ≥1 were considered non-toxicity and toxicity, respectively. Genomic DNA of all subjects was isolated from the peripheral blood sample using a FlexiGene DNA Kit (Qiagen, Hilden, Germany) and stored at 4 °C until use. Genotyping was conducted by Sequenom's MassARRAY system (Sequenom, San Diego, California, USA).

Polymorphism data and SNP marker selection

Genotype data of SNPs were retrieved from the International HapMap Project database (Phase III Release #28, www.hapmap.org). Six populations were included in this study, including CEU, CHB, JPT, YRI, ASW, and CHD. The ancestral allele of each SNP was defined by aligning the sequence of each SNP to a chimpanzee (Pan troglodyte) sequence from the UCSC database (genome.ucsc.edu).

SNPs spanning approximately 45.8 kb of the eIF3 a locus were selected. To generate a more evenly distributed and denser SNP map, we extended the selection region to 200 kb upstream and downstream of the eIF3a locus. SNPs in the following categories were excluded from the study: (1) SNPs with a minor allele frequency <0.05, (2) SNPs that were not genotyped in at least one population and (3) SNPs departing from Hardy-Weinberg equilibrium. In total, 127 SNPs were included, and a flow chart describing SNP selection and information regarding included SNPs are provided in Figure S1 and Table S1, respectively.

Data analysis

Data analysis was performed using R and Bioconductor. In addition, the Haploview program was utilized to perform genetic analysis, including allele frequency description and selection, LD and haplotype block analysis14. Lewontin's coefficient |D'| and Pearson's correlation r2 were employed to estimate LD between SNP pairs. Statistical significance of LD between SNP pairs was analyzed by Arlequin16. The classical LD decay equation D=D0(1–θ)t was utilized to calculate LD decay trend line, which evaluated the overall LD in the populations17. Half-LD (LD0.5) was defined as the distance at which |D'|=0.5. Haplotype reconstruction was conducted in the PHASE program18. An unbiased estimator of FST (F-statistic/fixation index) from pairwise population comparisons was calculated. Randomization method (1000 permutations) was used to test the statistical significance of FST in the comparison using the FSTAT program19. A modified LRH test was used to assess for positive selection, and single SNP rather than haplotype was referred as “core”13. Haplotype homozygosity (HH) was calculated in a stepwise manner as extended haplotype homozygosity (EHH)13. Relative EHH (REHH) was the ratio of the EHH on the test core allele compared with the EHH of the grouped set of core alleles at the region not including the core allele tested13. Haplotype bifurcation diagram (HBD) was used to graphically evaluate the LD decay of both ancestral and derived alleles for a given core SNP. To test the statistical significance of the LRH test, an empirical distribution of core haplotype frequencies versus REHH was obtained by screening the entire chromosome 10 HapMap data in ASN, CEU and YRI populations. The HBD and LRH statistical tests were performed using Sweep software.

Statistical analysis of association study

All continuous variables were presented as means±SD and confidence intervals (95% CI), and normally distributed data were analyzed by the two-sample t test. Noncontiguous variables, genotype distributions, and allelic frequencies in different groups were compared using the χ2 test. Unconditional logistic regression was performed to estimate the association of drug response and toxicity with eIF3a polymorphisms by calculating odds ratios (OR) and their 95% confidence intervals (CI). The P value was two-sided and P<0.05 was considered statistically significant. The aforementioned statistical analyses were performed using PLINK20 and SPSS 18.0 software (SPSS Inc, Chicago, Illinois, USA). Bonferroni correction was used to correct multiple testing.

Results

Haplotype and LD profile

We first detected the haplotype and LD profile of the involved SNPs, and the pair-wise |D'| and r2 were calculated to determine the eIF3a LD pattern. As shown in Figure 1, the patterns of LD structure were highly different across the entire locus between Africans and non-Africans. There are 18, 16, 17, 14, 29, and 29 blocks observed in CHD, CHB, JPT, CEU, ASW, and YRI populations, respectively. The longest blocks spanned 91 kb in CHD, 91 kb in CHB, 91 kb in JPT, 94 kb in CEU, 48 kb in ASW, and 26 kb in YRI.

Figure 1
figure 1

LD pattern of eIF3a polymorphisms in the CHD, CHB, JPT, CEU, ASW, and YRI populations. Pairwise LD between polymorphisms was measured by |D'|. Haplotype blocks partitioned in each population were also numbered above them. The analysis was performed in the Haploview program (version 4.2).

PowerPoint slide

We then plotted the |D'| and r2 between SNP pairs against their physical distance, as shown in Figure 2. The LD generally decreased with physical distance. A curve was then fitted to determine the half LD (LD50) in all populations. The genetically older populations were expected to have lower LDs, and this was the case in our study. The LD50s of African populations YRI (44.5 kb) and ASW (61.5 kb) were shorter than those of non-African populations, while the longest LD50s presented in the Chinese population were CHD (78.3 kb) and CHB (75.4 kb).

Figure 2A
figure 2

LD decay profile of eIF3a polymorphisms in the CHD, CHB, JPT, CEU, ASW, and YRI populations. (A) |D'| between eIF3a SNP pairs were calculated and plotted against their physical distance in the CHD, CHB, JPT, CEU, ASW, and YRI populations. A classical LD decay equation was utilized to calculate LD decay trend line, which is indicated by a red line in the figure. The LD50 was also shown. Details of the calculation were described in the Materials and methods section.

PowerPoint slide

Furthermore, we calculated the proportion of SNP pairs with significant LD in these populations and observed clear differences. As shown in Figure S2A, African populations had fewer SNP pairs with significant LDs: the proportions were 23.66% and 19.03% in YRI and ASW populations. For non-African populations, however, 39.07%, 38.72%, 36.71%, and 35.88% SNP pairs showed significant LD in the CHD, CEU, CHB, and JPT populations, respectively. We also plotted the proportion of SNP pairs with insignificant LDs against physical distance for ASN (East Asian population; combined CHB and JPT populations), CEU and YRI populations (Figure S2B). In accordance with previous observation, the African YRI population showed lower LD levels, while the ASN population showed higher allelic association.

Together, the haplotype and LD pattern displayed high diversity among the 6 populations analyzed, especially between African (ASW and YRI) and non-African (CHD, CHB, CEU, and JPT) populations.

Population differentiation

To further test for differences in eIF3a allelic frequency among populations, we calculated FST values. Considering that the CHB and JPT populations exist in the same region and have substantial similarity in their genetic makeup, they were combined to make one ASN population in further analyses. Additionally, the following analyses were conducted on only the ASN, CEU, and YRI populations, which reflected a substantial amount of the genetic variation found throughout the world21.

Based on the pairwise FST values, eIF3a showed high population differentiation. SNPs with a proportion of 21.71% in CEU versus ASN, 43.41% in CEU versus YRI and 33.33% in ASN versus YRI showed significantly high FST, respectively. We next plotted the FST values against physical distance in Figure S3. As shown, 7 SNPs in CEU versus YRI and 5 SNPs in ASN versus YRI had FST values above the threshold of 0.45, which was considered the threshold value of selection pressure22,23. However, none of the SNPs in CEU versus ASN had FST values above 0.45, indicating that eIF3a had low levels of differentiation between these two populations.

These results indicated that there was significantly different population differentiation among the ASN, CEU, and YRI populations, and that some SNPs may be under selective pressure.

Signature of positive selection of eIF3a

To test for evidence of positive selection occurring at these SNPs with eIF3a, we performed an LRH test. We first plotted the relative EHH (REHH) of core SNPs against their allele frequencies. Based on the empirical distribution, 41, 33, and 13 SNPs showed significant departure from evolutionary neutrality in the YRI, ASN, and CEU populations, respectively. The top 5 SNPs with lowest P values were then involved in further study. As shown in Figure 3, A allele of SNP rs1409314 (P=0.0012), C allele of SNP rs4752219 (P=0.0025), A allele of SNP rs4752220 (P=0.0025), T allele of SNP rs7091672 (P=0.0043) and A allele of SNP rs10510050 (P=0.0046) deviated significantly from evolutionary neutrality. To assess this result, the EHH decay of these 5 SNPs was plotted against the physical distance. Generally, the overall EHH of the rs1409314A, rs4752219C, rs4752220A, rs7091672T, and rs10510050A alleles decayed more slowly than their corresponding ancestral/derived alleles in all three populations (Figure S4). To confirm this result, we plotted haplotype branching diagrams (HBD) for these 5 SNPs, as indicated in Figure S5. Consistent with previous results, the rs1409314A, rs4752219C, rs4752220A, rs7091672T, and rs10510050A alleles had obvious long-range LD across all populations, especially in YRI, as indicated by a single distinctly predominant thick branch. Taken these results together, we concluded that rs1409314, rs4752219, rs4752220, rs7091672, and rs10510050 showed a signature of recent positive selection.

Figure 3
figure 3

LRH test for 5 positively selected SNPs. The REHH value of 5 core alleles was plotted against its frequency in the ASN, CEU and YRI populations. The points of REHH against allele frequency were based on the calculation of chromosome 10 phased data downloaded from the HapMap database (data point not shown in the figure). The 50th, 75th, 90th, 95th, 99th, and 99.9th percentile lines were obtained by binning the empirical data by allele frequencies into 20 bins with equal intervals of 0.05. All analysis was performed in the Sweep programe.

PowerPoint slide

As a control, we further analyzed 5 randomly selected SNPs (rs10886342, rs11198804, rs2275112, rs10787899, and rs4752269), which didn't show evidence of positive selection. These 5 SNPs were selected as representative non-positively selected SNPs of eIF3 a for use in the following study.

Association of selected SNPs and platinum-based chemotherapy response

Our previous study demonstrated that eIF3a may act as an oncogene and may be correlated with lung cancer chemotherapy response and toxicity4,6,7. Thus, we tested whether these positively selected SNPs were associated with platinum-based chemotherapy response in Chinese NSCLC patients. A total of 325 lung cancer patients were recruited for genotyping of the aforementioned 5 positively selected and 5 non-positively selected SNPs. There were significant differences between responders and non-responders in terms of age and chemotherapy regimen, suggesting that these factors need to be adjusted in the statistical analysis (Table 1). All tested SNPs were consistent with Hardy-Weinberg equilibrium in both responders and non-responders.

Table 2 summarizes the results of association study of 10 SNPs in additive, dominant and recessive models. As indicated, neither the 5 positively selected SNPs (rs1409314, rs4752219, rs4752220, rs7091672 and rs10510050) nor the 5 non-positively selected SNPs (rs10886342, rs11198804, rs2275112, rs10787899, and rs4752269) were significantly associated with NSCLC patients' platinum-based chemotherapy response.

Table 2 Association of eIF3a SNPs with platinum-based chemotherapy response in the Chinese patients.

Association of selected SNPs and platinum-based chemotherapy toxicity

We next tested whether positively selected SNPs were correlated with chemotherapy toxicity in these patients. In the present study, we investigated 5 toxicities: neutropenia, anemia, thrombocytopenia, emesis and hepatotoxicity. As indicated in Table 3, each was divided into two groups: toxicity (0 grade) and non-toxicity (≥1 grade). In general, 64%–68% patients showed at least one toxicity and only 32%–36% patients did not suffer from any toxicity.

Table 3 Groups of graded platinum-based chemotherapy toxicities.

It is interesting to note that 4 (rs1409314, rs4752219, rs4752220, and rs7091672) of the 5 positively selected SNPs were significantly correlated with all tested toxicities (Table 4). In addition, a significant association was also detected between rs10510050 and thrombocytopenia (OR=4.34, 95% CI=1.47–12.81, P=0.00), emesis (OR=0.39, 95% CI=0.18–0.83, P=0.01) and hepatotoxicity (OR=0.58, 95% CI=0.34–0.99, P=0.04). In contrast, of the 5 non-positively selected SNPs (rs10886342, rs11198804, rs2275112, rs10787899, and rs4752269), none showed significant association with any of the 5 tested toxicities. These results together suggest that the aforementioned 5 positively selected eIF3a SNPs are significantly correlated with platinum-based chemotherapy toxicities in Chinese NSCLC patients.

Table 4 Association of eIF3a SNPs with platinum-based chemotherapy toxicities in the Chinese patients. bP<0.05.

Discussion

In the present study, we first characterized the haplotype and LD profile of eIF3a in data obtained from the HapMap database, then analyzed the population differentiation in three populations (ASN, CEU, and YRI). Moreover, we identified 5 SNPs (rs1409314, rs4752219, rs4752220, rs7091672, and rs10510050) of eIF3a that had undergone recent positive selection using the LRH test. Finally, we demonstrated that these SNPs were correlated with platinum-based chemotherapy toxicities in Chinese NSCLC patients.

With the explosion of newly identified SNPs, it is becoming impractical to investigate every SNP individually for correlation with a disease or drug response. Although new methods of DNA sequencing and genotyping have been developed, these are time consuming and costly. Thus, bioinformatics methods have been developed to identify potential functional SNPs. Based on the theories of natural selection and the aforementioned “Out-of-Africa” hypothesis, searching for signatures of recent positive selection can be used to identify potential functional SNPs24. With the dispersal of humans from Africa to the other regions of the world, different populations faced various selective pressures presented by climate, pathogens and sources of food. Thus, functional SNPs that facilitated survival in ancestral human populations may result in phenotypic differences among modern individuals or groups25. These SNPs may be retained under the pressure of positive selection; for example, a classic G6PD-202A variant that confers malaria resistance showed a strong evidence of recent positive selection13. Other groups have successfully utilized this method to detect functional SNPs in PCSK912 and ABC drug transporter genes26. In the current study, we employed this method to identify 5 positively selected eIF3a SNPs and confirmed that they were significantly correlated with platinum-based chemotherapy toxicities in Chinese NSCLC patients. Importantly, another 5 non-positively selected control SNPs did not show evidence of association. Although our previous studies identified a number of platinum-based chemotherapy responses and toxicities associated SNPs27,28,29, this time- and cost-saving method is a potentially powerful tool for specifically identifying SNPs that correlate with drug response in the future.

Although eIF3a is the largest subunit of eIF3, which is itself the largest and most complex initiation factor, it has been shown to be dispensable for the function of eIF3-medicated translation initiation2,30. By contrast, previous studies by our group and others have suggested that eIF3a is a potential oncogene. Thus, it is important to understand the contributions of eIF3a polymorphisms to its correlation with cancers. Our previous study investigated the correlation of eIF3a polymorphisms and platinum-based chemotherapy resistance and severe toxicity in the Chinese population8,9. Olson et al found that eIF3a rs10787899 was significantly correlated with breast cancer risk in Caucasian female populations31. In the current study, we further identified 5 polymorphisms which correlated with lung cancer platinum-based chemotherapy toxicities in the Chinese population. While the mechanisms of the correlation remain unknown, we speculate that these SNPs may be potential genetic markers for drug response and toxicity.

In the present investigation, we extended the SNP selection region to 200 kb upstream and downstream of the eIF3a locus to generate a more even distribution. However, some regions selected overlapped with other genes, including family with sequence similarity 45, member A (FAM45A) in the eIF3a 5′ untranslated region (5′ UTR) and nanos homolog 1 (Drosophila) (NANOS1) in the eIF3a 3′ UTR regions. Indeed, the 5 positively selected SNPs (rs1409314, rs4752219, rs4752220, rs7091672, and rs10510050) were located in the overlap region of NANOS1 and eIF3a. Thus, they may be considered genetic markers for both genes. This point is especially important for in vitro functional study, where both genes should be considered in the investigation. It should be noted that some limitations need to be considered when interpreting results of this study. First, although we confirmed the function of these positively selected SNPs in clinical patients, we did not conduct a study in vitro. Therefore, how these SNPs affected eIF3a function still remains unknown. Another limitation came from our sample size. The current study had a relatively small sample size and lacked confirmation from other validation stage populations. Additionally, we only detected the correlation in the Chinese population. Thus, this result needs to be further confirmed in a larger sample size and in different ethnic populations.

In summary, we identified 5 positively selected eIF3a SNPs using the LRH test and confirmed that they were significantly correlated with platinum-based chemotherapy toxicities in Chinese NSCLC patients. Our results imply that this strategy of SNP selection may be an inexpensive, effective way to discovery of SNPs correlated to drug response and toxicities.

Author contribution

Ji-ye YIN, Hong-hao ZHOU, and Zhao-qian LIU conceived and designed the experiments. Ji-ye YIN, Xiang-guang MENG, Juan CHEN, and Yi ZHENG performed the experiments. Ji-ye YIN, Xiang-guang MENG, and Rong LIU analyzed the data. Chen-yue QIAN and Xiang-ping LI contributed reagents, materials and analysis tools. Ji-ye YIN and Zhao-qian LIU wrote the paper.

Figure 2B
figure 4

LD decay profile of eIF3a polymorphisms in the CHD, CHB, JPT, CEU, ASW, and YRI populations. (B) r2 between eIF3a SNP pairs were calculated and plotted against their physical distance in the CHD, CHB, JPT, CEU, ASW, and YRI populations. Details of the calculation were described in the Materials and methods section.

PowerPoint slide