Introduction

Preferred breast cancer (neo)adjuvant chemotherapy regimens are generally anthracycline based, given the improved outcomes compared with cyclophosphamide/methotrexate/fluorouracil.1 However, across all anthracycline-treated patients, only a small percentage actually receives benefit while these agents are associated with significant toxicities. Breast cancer is well recognized as a heterogeneous disease and therefore treating all breast cancers with the same chemotherapeutic agents could be considered illogical. Of considerable use would be a predictive marker of response to distinguish patients likely to receive benefit from those who are not, sparing predicted ‘poor responders’ from associated toxicities. Unfortunately, suitable predictive biomarkers for chemotherapeutic agents have remained elusive to date.

Topoisomerase IIα gene (TOP2A) is a putative marker of anthracycline sensitivity, with its gene product being the direct target of anthracyclines. TOP2A amplification has been shown to predict increased sensitivity to anthracyclines in several studies,27 although this finding has not been entirely consistent.8,9 Indeed, a single biomarker may not be sufficient to predict anthracycline response, and a multifactorial approach using gene signatures might be required.7,10 In the TOP trial7 of 149 patients with estrogen receptor (ER)-negative early or locally advanced breast cancer treated with a single agent epirubicin, TOP2A amplification was significantly associated with pathological complete response (pCR). In addition, a multifactorial anthracycline sensitivity score (the A-score), comprised of three gene signatures, was evaluated, demonstrating a very high negative predictive value (NPV), although a much lower positive predictive value (PPV).7

Gene signatures are often defined through retrospective analyses of tumor tissue gene expression patterns correlated with patient outcomes. Although concordance in outcome prediction between signatures has been demonstrated,11 signatures derived in this manner often contain few genes in common as well as numerous genes of unknown function, making their clinical relevance less certain. It might be possible to improve clinical relevance of a signature by identifying the molecular processes required for a specific cellular function, such as anthracycline-induced cytotoxicity, and construct a signature containing measures of each of these functions. On the basis of this hypothesis, we aimed to construct and evaluate a multifactorial Consensus Signature for predicting anthracycline sensitivity in triple-negative breast cancer (TNBC). The term Consensus Signature was chosen to reflect the concept that, by each selected component acting as a surrogate marker of the different steps required for anthracycline cytotoxicity, included components (the genes and gene signatures) would work synergistically to provide an overall measure of effective anthracycline function.

We focused specifically on TNBC for the following reasons. First, there has been substantial work performed already evaluating the predictive role of TOP2A in HER2+ breast cancer, due to the known relationship between TOP2A amplification and HER2 amplification. Anthracyclines are commonly used in TNBC and appear to have activity. We wanted to assess this without the confounding factor of HER2 overexpression. Second, with treatment options in TNBC limited to chemotherapy, more effective use of chemotherapy would be of considerable benefit. Finally, with growing understanding of breast cancer biological diversity, evaluation of a predictive biomarker within a specific subtype might be preferable, as positive results could otherwise be masked if evaluated across a heterogeneous combined cohort.

MATERIALS AND METHODS

Data set

The construction and evaluation of consensus signatures (ConSigs) were carried out using a retrospective cohort study design, with in silico analyses of previously collected genetic data, clinical characteristics, and responses. The data set comprised gene expression profiles of patients who had received neoadjuvant anthracycline-based chemotherapy without a taxane. As the ConSigs were designed to be specific for anthracyclines, taxane use was considered a confounding factor.

Data derived from Affymetrix (Santa Clara, CA, USA) gene expression arrays based on build 133 of UniGene database (HG-U133) were combined and evaluated as a single group, designated the ‘breast compendium’ (details on the construction and composition of the breast compendium are reported in Supplementary Table 1). The subset of samples treated with anthracycline-based neoadjuvant chemotherapy was used as training set to derive the ConSigs, whereas the cohort of patients treated with anthracycline plus taxane-based neoadjuvant chemotherapy served as a control group to assess ConSig specificity for anthracyclines. In addition, two cohorts of patients treated with anthracycline-based neoadjuvant chemotherapy with gene expression data derived from different microarray platforms served as validation sets. The European Organisation for Research and Treatment of Cancer (EORTC)/BIG00-01 data set included gene expression data obtained with the Affymetrix X3P array from patients with locally advanced, inflammatory, or large operable breast cancers treated with either fluorouracil/epirubin/cyclophosphamide or docetaxel followed by docetaxel/epirubicin under the auspices of the EORTC 10994 trial.12 The data set was available in the Gene Expression Omnibus repository under accession number GSE6861. The Netherlands Cancer Institute (NKI) data set (accessible under accession number GSE34138) used the Illumina (San Diego, CA, USA) HumanWG 6 v3.0 expression beadchip for gene expression profiling and included patients with intermediate or high-risk ER-breast cancer treated with neoadjuvant dose-dense doxorubicin/cyclophosphamide.13

For this study, only TNBC patients in the training and control subsets of the compendium and in the validation sets were considered. Further details on the construction of the breast compendium and on the definition of TNBCs are reported in the Supplementary Methods.

Design of the consensus signature

In order for anthracyclines to be effective, we postulated that the following steps must occur: (1) penetration of the drug into the tumor bed, (2) location of the target (topoIIα protein) within the nucleus, (3) increased topoIIα messenger RNA (mRNA) expression above that related to proliferation alone, (4) induction of apoptosis, and (5) active immune/stromal function.

For each step, a representative gene or gene signature was selected, with gene signatures chosen over single genes where possible. In some cases, more than one potential genes/gene signatures was evaluated for each step, to compare their relative utility. Details of the marker(s) evaluated, their association with anthracycline response, and rationale for their selection are listed in Table 1.

Table 1 Consensus Signature components based on putative steps required for effective anthracycline-induced cytotoxicity

Quantification of genes and gene signatures

SHARP1 signature,14 HIF1α hypoxia signature (HIF),14 and the Minimal Gene signature15 were quantified as previously described. Briefly, each signature was calculated by summarizing the standardized expression levels of the genes in the signature into a combined score with zero mean. AURKA, STAT1, and PLAU signatures were computed as previously described16 using genefu R package. Briefly, for each sample, the signature was quantified as: siωiξii|ωi| where ξI is the expression of a gene i included in the set of genes of interest and ωI is either +1 or −1 depending on the sign of the association under study. Gene expression lists for each signature are included in Supplementary Table 2.

LAPTM4B, AURKA, YWHAZ, and topoIIα mRNA expression levels were calculated using the corresponding probe sets or the median expression if multiple probe sets were available for each gene. For the NKI data set, probes were filtered on the basis of their quality, keeping only probes classified as ‘perfect’ and ‘good’ in the Bioconductor illumina Human v3.db annotation package.

Quantification of the Consensus Signature

The Consensus Signature score was calculated in a continuous form, consisting of a linear combination of the various components/signatures. Prior to combination, each component/signature was scaled to have the interquartile range equal to 1 and the median equal to 0. On the basis of the association with pCR (Table 1), we hypothesized that a high Consensus Signature score would predict for increased pCR rate, whereas a low score should predict anthracycline resistance.

Statistical analysis

All statistical analyses were performed in R version 2.15.1.17 Odds ratios (ORs) were used to compare pCR rates between groups defined by different clinical and molecular characteristics (stats R package). The area under the curve (AUC) was used to assess the prediction performance of any signature score (ROCR R package). AUC was estimated through the concordance index (survcomp R package) under the alternative hypothesis that AUC was greater than 0.5, as each signature score was designed to have positive AUC. Its significance and confidence interval were estimated assuming asymptotic normality. Because of the differences in array design and technology between the training and the two validation sets, the threshold ConSig score for each cohort was calculated using the score value that corresponded to the 75th percentile of the score distribution for that cohort. P values of <0.05 were considered significant.

Results

Data set

The training set was derived from a breast cancer data set originally consisting of 4,600 samples collected in 27 different studies. After exclusion of duplicate samples (n=939) and adjusting for batch effect using ComBat,17 1,069 samples (29%) were classified as TNBC by the SCMOD2 subtype clustering classifier (subtype clustering model)18 contained in genefu R package.19 Among these samples, 491 had information about neoadjuvant chemotherapy. Eleven samples were from patients treated with taxane but not anthracycline and were excluded, whereas 147 and 333 samples were from patients treated with anthracycline-based therapy without taxane and anthracycline-based therapy with taxane, respectively (Figure 1).

Figure 1
figure 1

Consort diagram for selection of samples in the training set. A, anthracycline-based chemotherapy; BC, breast cancer; NAC, neoadjuvant chemotherapy; pCR, pathological complete response; T, taxane; TNBC, triple-negative breast cancer.

Clinical and tumor characteristics for the patients in the training set (n=147) treated with anthracycline-based chemotherapy are listed in Table 2. The samples were originally contained in four different data sets,7,2022 and included 29 (19.7%) pCRs and 118 (80.3%) samples with residual disease. pCR was defined as ypT0/is, ypN0 in all included studies.

Table 2 Clinical and tumor characteristics of TNBC patients in the training set treated with anthracycline-based neoadjuvant chemotherapy without taxane (n=147)

Clinical characteristics

All clinical variables were tested for their ability to predict pCR, with no significant association between any clinical characteristics and pCR found (data not shown).

Predictive power of single gene or gene signatures

Using receiver-operating characteristic curves, the ability of any single component (gene or gene signature) to discriminate patients with pCR from patients with residual disease in the training set was assessed. STAT1 was significantly associated with pCR status (Supplementary Table 3; Supplementary Figure 1a). All other components, when considered individually, were not significantly correlated with pCR. TopoIIα mRNA corrected for proliferation with either AURKA mRNA or AURKA signature showed no increased correlation with pCR compared with topoIIα mRNA alone (data not shown).

Predictive power of ConSigs

ConSigs was constructed using various combinations of components, with the starting point being components shown to have significant or near-significant predictive capability when used alone, that is, STAT1, topoIIα, HIF, and LAPTM4B. Using a continuous score to quantify ConSig expression level, all combinations of core components demonstrated a significant correlation with pCR in patients in the training set treated with anthracycline-based chemotherapy without taxane. The two most predictive combinations were designated ConSig1: (STAT1+topoIIα+LAPTM4B) with AUC 0.70 (Supplementary Figure 1b), P=3.9×10−5, and ConSig2: (STAT1+topoIIα+HIF) with AUC 0.71, P=4.2×10−6. High correlation with pCR was maintained with the addition of further component genes/gene signatures to either ConSig1 or ConSig2, but overall predictive power was not better than with three components (Table 3). The combination of STAT1+PLAU, the components for TNBC of another multifactorial scoring signature, the A-score,7 was correlated with pCR, although less strongly than other combinations. Substituting topoIIα mRNA with topoIIα mRNA corrected for proliferation did not improve the performance of any of the ConSigs.

Table 3 Correlation with pCR for various combinations of ConSig components in the training set

To assess specificity of ConSigs for anthracycline response compared with other chemotherapy regimens, we analyzed their respective performances in a control group of patients who received taxane in addition to anthracycline (n=333), 299 of whom had information about response and 101 with pCR. For the ConSigs with the highest predictive power in the training set, i.e., ConSig1 and ConSig2, neither was correlated with pCR in this control group (Supplementary Figure 1c and d). Although (STAT1+PLAU) had predicted response to anthracycline-based chemotherapy, it did not appear to be anthracycline specific, performing similarly in anthracycline+taxane-treated patients (Table 3).

Classification performance of ConSigs

A threshold score that could be used to classify a patient in the training set as a putative responder or as resistant was determined for each ConSig by selecting the score value corresponding to the 75th percentile of the score distribution. PPV and NPVs, sensitivity, specificity, and OR were then calculated. For ConSig1 NPV was high (85%) as was OR for lack of pCR (OR=3.18, P=0.008) (Table 4). PPV, however, was modest (PPV=35%). ConSig2 performed similarly, with high NPV and OR, but modest PPV. In the control group of 299 patients treated with anthracycline plus taxane, NPVs for ConSig1 and ConSig2 were lower, at 66 and 67%, respectively, and ORs for lack of pCR were no longer statistically significant for either ConSig (Table 4), further supporting the specificity of these ConSigs for anthracycline response.

Table 4 Performance of ConSig1 and ConSig2 in the training set for predicting pathological complete response

Evaluation of ConSig1 and ConSig2 in two independent ‘validation’ data sets

Two data sets, NKI and EORTC/BIG00-01, were selected as validation sets. The EORTC/BIG00-01 data set12 comprised 161 samples, 85 of which were TNBC, with 46 patients treated with anthracycline without taxane (18 pCRs) and 39 treated with anthracycline plus taxane (18 pCRs). The NKI data set13 included 178 ER-negative breast cancer samples, 52 of which were TNBC. Of these 52 patients, 24 had pCR. All patients received anthracycline-based neoadjuvant chemotherapy without taxane.

In the NKI data set, both ConSig1 and ConSig2 were significantly correlated with pCR for patients receiving anthracycline without taxane (Table 5). In the EORTC/BIG00-01 data set, ConSig1 remained specific for anthracyclines in the data set, whereas ConSig2 was correlated with pCR for both anthracycline-based and anthracycline+taxane-based neoadjuvant chemotherapy.

Table 5 Performance of ConSig1 and ConSig2 in the validation sets for predicting pathological complete response

Discussion

Overall, the best performing combination of components was ConSig1 (STAT1+topoIIα+LAPTM4B). Our results suggest that ConSig1 has excellent ability to predict anthracycline resistance within a cohort of anthracycline-treated TNBC patients. This is clinically relevant, as, if further validated, ConSig1 could be used to identify TNBC patients for whom the addition of anthracycline is likely to add toxicity without benefit, and thus for whom an alternate chemotherapy regimen might be selected. When evaluated in TNBC patients who received anthracycline and taxane, the predictive ability of ConSig1 was lost. Although not conclusive, a lack of predictive utility in patients who also received taxanes supports the anthracycline specificity of ConSig1. ConSig2 was most strongly correlated with pCR in the training set; however, in the validation set ConSig2 did not show discrimination in performance between patients treated with anthracycline-based and anthracycline plus taxane-based neoadjuvant chemotherapy.

Although we considered that five main processes should take place for anthracyclines to cause cell death, markers for two of these processes, that is, induction of apoptosis and hypoxia/drug penetration, did not appear to contribute significantly to the predictive power of ConSig1. Given that immune function (STAT1) was a powerful contributor to ConSig1 and ConSig2, we postulated that in the setting of a highly active immune response, intact apoptotic pathways as measured by Minimal Gene signature15 or YWHAZ23 might be less important. However, predictive power was not improved using either Minimal Gene signature or YWHAZ in absence of the immune/stromal components (STAT1 and PLAU) (data not shown). Although the inclusion of a component marker of hypoxia had the highest correlation with pCR in the training set (ConSig2), this combination did not show consistent anthracycline specificity, with similar performance in anthracycline- and anthracycline plus taxane-treated patients in the EORTC validation set. This might be because hypoxia is a critical factor in the function of other chemotherapy agents, not only anthracyclines. Interestingly, although topoIIα protein expression is known to relate to proliferation,24 correcting topoIIα mRNA expression level for proliferation made no difference to the predictive utility of the ConSigs. The lack of discriminating ability of the proliferative markers, AURKA or AURKA signature, might relate to the fact that the majority of tumors in our data set (76%) were grade 3 with most of the rest being grade 2, and thus were all moderately to highly proliferating. With no comparator group of low-proliferating tumors, the influence of a proliferation marker cannot easily be evaluated. Furthermore, when considering a cohort of TNBC, proliferation is characteristically high, making the ability to measure variability of topoIIα protein expression due to low versus high proliferation arguably less critical. Although the influence of proliferation itself in the results is uncertain, the specificity of ConSig1 for anthracycline response over and above that of proliferation is supported by the fact that it is not predictive in patients treated with anthracyclines plus taxanes, a situation where proliferation is still important.

Differently from the A-score, which evaluated ER-/HER2+ and ER-/HER2 tumors, we focused specifically on TNBC, a group where treatment is limited to chemotherapy, and thus where optimization of chemotherapy regimen would be of considerable utility. Interestingly, the predictive ability of ConSig1 for TNBC patients treated with anthracycline-based chemotherapy without taxane appeared to be better than the combination of STAT1+PLAU, the component gene signatures in the A-score for TNBC.

By constructing a biologically relevant multifactorial ConSig, we hypothesized that this should predict anthracycline response, as well as resistance; however, PPV of all ConSigs in the training set were lower than anticipated, and only around 35% for ConSig1. Conversely, ConSig1 predicts likelihood of lack of response (NPV), which is still a finding of considerable clinical utility if further validated. In general, high PPV for biomarkers has previously been shown to be hard to achieve. Considering two well-established biomarkers and the only two routinely used in breast cancer management, ER and HER2, positivity predicts treatment response in only around 50% of patients for endocrine therapy25 and less than 40% for trastuzumab as a single agent.26 However, it is worth noting that the small number of pCRs in the training set (29/147; 19.7%) might have contributed to less than stable results for PPV. Indeed, in the validation sets, where pCR rates are 39% (EORTC/BIG00-01) and 46% (NKI), PPV is higher than 75% (Table 5).

Finally, our sample size was relatively limited. The training set used contained all publicly available data on the same platform for TNBC treated with anthracycline-based chemotherapy. However, this incorporated only 147 TNBC patients. Thus, further validation of ConSig1 in independent TNBC cohorts is necessary.

In conclusion, this project demonstrated the feasibility of defining a multifactorial Consensus Signature for predicting anthracycline response and, when applied in a TNBC patient cohort, ConSig1 was highly predictive for anthracycline resistance. With further validation, this signature may provide clinicians with a useful tool for improved selection of TNBC patients for anthracyclines, potentially leading to better treatment tolerance and more effective therapy.