Introduction

Breast cancer is a heterogeneous disease and the different therapeutic modalities used for breast cancer patients reflect this heterogeneity. As highlighted in the 13th St. Gallen International Breast Cancer Conference [1], endocrine therapy (often used alone) is recommended for Luminal A-like breast cancers, anti-HER2 therapy is the most crucial intervention in HER2 positive disease, whereas the only available approach for triple-negative tumors is the use of cytotoxic chemotherapy. Luminal B/HER2− carcinomas represent an intermediate entity from a clinico-therapeutic point of view as endocrine therapy is recommended for all patients and cytotoxic therapy for most of them [1]. Significant efforts are currently being employed to discriminate patients who will benefit from chemotherapy. Multi-gene assays appear to help recognize patients with Luminal disease for whom chemotherapy is not effective, i.e, patients with a low Recurrence Score (RS) by Oncotype DX [2, 3] and those with a “good prognosis signature” with the 70-gene signature assay [4]. Similar results were obtained in the neoadjuvant setting: no or very few pathological complete responses (pCR) to neoadjuvant chemotherapy (NAC) were observed among patients with low risk of recurrence (ROR) based on PAM50 [5]. Additionally, both the 70-gene good prognosis signature and a low 21-gene RS predict a low probability of pCR [6, 7]. The proliferative activity of tumors as assessed with immunohistochemical detection of the cell-cycle-specific antigen Ki67 has been extensively studied [812] and is presently one of the parameters used to address systemic adjuvant therapy for patients with Luminal breast cancers [13]. A recent study has shown a beneficial effect of the addition of chemotherapy to hormone therapy in Luminal B/HER2− carcinomas with a high proliferative index (Ki67 > 32 %) [14]. A significant impact in breast cancer prognosis has also been suggested for the mitotic count [15, 16], but few studies have considered its predictive value in the response to chemotherapy [17, 18]. Cytotoxic drugs, such as taxanes, which target microtubules, are believed to be more effective on cells that proliferate rapidly. However, it is known that the mitotic count is low in many chemosensitive human cancers [19]. Despite this observation, no studies have been carried out to solve this specific issue in breast cancer.

The neoadjuvant setting offers an invaluable opportunity to test the response to therapy, considering that the selection of regimens for NAC generally follows guidelines similar to those applied in the conventional adjuvant setting [20]. However, the majority of studies have focused their attention on pCR as the primary endpoint for response to chemotherapy because of its prognostic value [21, 22]. Thus, statistical analyses performed on neoadjuvant studies tend to group together partial response and non-response to treatment. To our knowledge, studies focusing on the non-responsive breast cancer category are not on record.

Taking all these data together, the main goal of the present study was to identify markers of non-response to NAC that could be used in the adjuvant setting, in particular markers for the Luminal B/HER2− category. The analyses were performed by comparing the group of patients showing some response (from partial to complete) to NAC with the group of patients lacking any response. Furthermore, we investigated for the first time the potential value of the mitotic index in offering additional information about the likelihood of non-response to treatment with taxane-based regimens.

Patients and methods

Study design

Sixteen pathologists of the European Working Group for Breast Screening Pathology from different European Institutions (Città della Salute e della Scienza di Torino, Turin, Italy; 2nd Department of Pathology, Semmelweis University, Budapest, Hungary; Centro Regional De Oncologia De Coimbra, Coimbra, Portugal; University Hospital Zurich, Switzerland; State Pathology Center, Riga, Latvia; Bács-Kiskun County Teaching Hospital, Kecskemét, Hungary; Hospital S. Joao, Porto, Portugal; Clinical Sciences Institute, Galway, Ireland; Donauspital am SMZO, Vienna, Austria; Complejo Hospitalario de Navarra, Pamplona, Navarra, Spain; AOU Careggi, Florence, Italy; Dietrich Bonhoeffer Medical Centre, Neubrandenburg, Germany; University Medical Center Utrecht, Utrecht, The Netherlands; Skane University Hospital, Lund, Sweden; The Netherlands Cancer Institute, Amsterdam, The Netherlands; The Princess Alexandra Hospital, Harlow, Essex, United Kingdom; University College Hospital, London, United Kingdom) were asked to participate to the study. These pathologists reviewed the core biopsy histology slides pertaining to patients with breast cancer treated with NAC at their Institutions. The number of core biopsies available for each patient ranged from 2 to 4, depending on the Institution protocol. The single Institutions recorded a list of clinico-pathological features in a dedicated database, as specified below.

Data collection and definitions

The data recorded were as follows: (i) in the pre-treatment biopsy: the histological type and grade [23], mitotic count, Ki67 proliferation index, presence of inflammation, presence of necrosis, Estrogen Receptor (ER), Progesterone Receptor (PgR), and HER2 status (based on both the immunostaining score and in situ hybridization analysis for score 2+); (ii) in the histological examination of the post-treatment surgical specimens, the degree of response to therapy was categorized following Pinder et al. [24] (Supplementary Table 1) in pathological complete response (pCR) if no residual invasive tumor was found (in situ carcinoma may be present), pathological partial response (pPR) if residual disease or minor signs of response were present on the surgical specimens compared to the tumor cellularity of the pre-treatment core biopsies, pathological non-response (pNR) if no evidence of response to therapy was detected (the presence of lymph node metastasis was not taken into account); and (iii) from the clinical records: the diameter of the tumor before NAC and the type of NAC. The response in lymph nodes was evaluated as detailed in Supplementary Table 1.

With regard to the definition of molecular subtypes, we referred to the St. Gallen recommendations from 2013 [1] that include five categories (Luminal A, Luminal B/HER2−, Luminal B/HER2+, HER2+ and triple negative). In particular, the Luminal B/HER2− category included ER positive carcinomas with >14 % of Ki67 [13] and/or PgR <20 % [25].

To define the methods to assess mitoses and Ki67, the participants had a preliminary meeting. The mitotic figures were counted in 10 high power fields of each core biopsy if possible, and the mitotic count was reported as the mean value. The results were then normalized as the number of mitosis per 2 mm2, i.e., an area equal to 10 fields at high magnification (40×) with a 0.51-mm objective diameter. Ki67 scoring (all centers used the MIB-1 antibody clone) was performed by counting a range of 100–500 cells (depending on the cellularity of the specimen), including also hot spot areas.

Inflammation was defined in two ways: (i) presence/absence of any clearly detectable tumor-infiltrating lymphocytes (TILs) within tumor cells (intra-tumoral) and/or stroma (stromal) at H&E; (ii) percentage of stromal TILs (st-TILs%) as recently recommended [26] (i.e., area occupied by mononuclear inflammatory cells over total intra-tumoral stromal area).

The final cohort comprised 506 cases, 490 of which had information available about their response to NAC.

Statistical analysis

For the purpose of defining the tumors that will not benefit from chemotherapy, univariate analyses were conducted by dividing the cohort of patients into two sets: pNR and pCR + pPR.

Differences in the distribution of the characteristics between the specimens of the patients with pCR or pPR and the patients with pNR were evaluated using Pearson’s Chi-Square Test and Fisher’s Exact Test for comparison of nominal variables and performing independent sample Nonparametric Test (Mann–Whitney U-Test) for comparison with continuous variables.

The cut-offs for Ki67 %, for the number of mitoses, and for st-TILs% that are able to discriminate the response to treatment (pNR vs pCR + pPR) were assessed for the entire cohort of breast cancer patients by receiver operating characteristic (ROC) curve analysis, in which the sensitivity (SE) is plotted as a function of 1-specificity (1-SP). The Youden Index (J), one of the main summary statistics of the ROC curve, defines the maximum potential effectiveness of a biomarker. J can be formally defined as J = max c {SE (c) + SP (c) − 1}. The cut-off value that achieves this maximum is referred to as the optimal cut-off point that optimizes the biomarker’s discriminating power when the sensitivity and specificity bear equal weight [27, 28].

The interactions between the response to treatment (pNR vs pCR + pPR) and the variables statistically significant at univariate analysis were tested in the entire cohort and in the Luminal B/HER2− cohort in a multivariable binary regression model.

All reported p values (p) were two-sided and p < 0.05 was considered to be statistically significant. All analyses were carried out using SPSS version 20 (SPSS Inc, Chicago, IL, USA) and STATA version 12.0 (Stata Corporation, College Station, Texas) statistical software. This article was written in accordance with the guidelines of the Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK criteria) [29].

Results

Entire cohort analysis

The clinical and histopathological features of the entire population are summarized in Table 1. The analyses of response to treatment were performed either considering primary tumor and lymph node response (454 cases) or tumor response only (490 cases). No statistically significant differences in terms of final distribution in response categories were observed between these separate analyses (Supplementary Tables 2, 3, and 4); therefore, we took into account the response on primary tumors only, which allowed us to perform the analysis in a larger number of cases (Supplementary Table 4).

Table 1 Histopathological characteristics of pre-treatment core biopsies and therapy-related data on the entire cohort of 506 patients with breast cancer treated with neoadjuvant chemotherapy

To evaluate the impact of the heterogeneity of standard of care in the different Institutions, we first analyzed by Chi-Square Test the difference of response taking into account the different treatment protocols. Twenty-one different treatment schemes were reported. We decided to cluster them in 3 main groups: 259 cases with taxanes, 135 without taxanes, and 87 with Herceptin. The two response categories (pNR vs pCR + pPR) were not significantly different in terms of treatment protocols as reported in Table 2. In particular, by excluding HER2 + carcinomas treated with Herceptin, we found that taxanes were used in association with other cytotoxic drugs in 217 cases (66 %) out of 328 patients with a pCR + pPR. Of 66 patients with a pNR, taxanes were used in the chemotherapy protocols in 42 cases (64 %).

Table 2 Treatment protocols and distribution of histopathological findings obtained from pre-treatment core biopsies in patients with no response (pNR) and with complete and partial response (pCR + pPR) to neoadjuvant chemotherapy in 490 cases

With regard to tumor size before NAC, no differences were observed between pNR and pCR + pPR (mean values 35.3 mm (SD: 16.3 mm) and 37.6 mm (SD: 22.2 mm), respectively; p = 0.881). Similarly, tumor necrosis was found at a similar rate in both response groups.

The tumors in the patients with pNR were less frequently of high histological grade (G3) (25 vs 45 %) and more frequently ER + (81 vs 63 %), PgR + (72 vs 53 %) and HER2− (89 vs 77 %) than were the tumors in the pCR + pPR category (Table 2). A lobular histotype was also more common in the pNR than in the pCR + pPR category (19 vs 6 %). On the contrary, the presence of TILs was significantly less common in the pNR (51 %) than in the pCR + pPR (77 %) category (Table 2). Similarly, the st-TILs % was differently distributed within pNR and pCR + pPR categories (mean values: 7 vs 12 %; SDs: 12.42 vs 17.51 %) (p = 0.002) (Supplementary Fig. 1a). Based on ROC curve analysis, the area under the curve was 0.639 (95 % CI 0.544–0.734) and the cut-off values of 1 % of st-TILs (J: 0.297) significantly discriminated the pNR from pCR + pPR group. In particular, to discriminate pNR tumors, a TILs% ≤1 had a sensitivity of 64 %, a specificity of 64 %, a positive predictive value (PPV) of 21 %, and a negative predictive value (NPV) of 92 % (Supplementary Fig. 1c).

Both the mitotic numbers and Ki67 percentages were differently distributed in the pNR and pCR + pPR categories (mean values: 8.5 vs 11.3 for mitosis and 27.9 vs 38.4 % for Ki67; SDs: 10.2 vs 12.1 for mitosis and 14.9 vs 25.7 % for Ki67) (p = 0.046 and 0.001, respectively) (Fig. 1). To determine the cut-offs for proliferation that best discriminate between the pNR and pCR + pPR categories, we performed ROC curve analyses (Fig. 2a, b). The area under the curve was 0.575 (95 % CI 0.506–0.644) for mitoses and 0.635 (95 % CI 0.555–0.715) for Ki67. The cut-off values of 9 mitosis/2 mm2 (J: 0.169) and 18 % of Ki67 positivity (J: 0.253) significantly discriminated the pNR from pCR + pPR groups (Fisher’s exact test p = 0.018 and <0.001, respectively). In particular, to discriminate pNR tumors, the cut-off of 9 mitosis had a sensitivity of 71 %, a specificity of 44 %, a PPV of 18 %, and a NPV of 90 %, whereas the Ki67 cut-off of 18 % showed a sensitivity of 47 %, a specificity of 77 %, a PPV of 26 %, and a NPV of 90 % (Fig. 2c, d). No statistically significant differences in the mitotic index were identified in the pNR and pCR + pPR groups in both the group of patients treated with and without taxanes. On the contrary, the Ki67 percentages were differently distributed in the two response categories, in both the group of patients treated with (p = 0.047) and without taxanes (p = 0.017). For these analyses, patients treated with Herceptin were excluded.

Fig. 1
figure 1

Box plots showing the correlations between the pathological response (pNR vs pCR + pPR) and distribution of Ki67% (a) and mitotic counts (b) in the entire cohort. Box plot explanation: upper and lower horizontal bars of the box = standard error; horizontal bar within the box = mean; upper and lower horizontal bars outside the box = standard deviation. Outliers were not included

Fig. 2
figure 2

Receiver operating characteristic (ROC) curves for the distribution of mitotic count (a) and Ki67% (b) as a plot of the true positive rate (sensitivity) against the false positive rate (1-specificity) for different possible cut-points. Area under the curve (AUC): measure of test accuracy in discriminating cases with no evidence of pathological response (pNR) from cases with a partial (pPR) or a complete response (pCR). Histograms showing the distribution of response rates for the mitotic cut-off (c) and the Ki67 cut-off (d)

Molecular subtype analysis

According to the St. Gallen recommendations [1], 70 tumors were Luminal A-like, 188 were Luminal B/HER2−, 46 were Luminal B/HER2+, 57 were HER2+, and 111 were triple-negative (Table 1). To evaluate the reliability of our data, we first assessed the distribution of patients with pCR in the different molecular subtypes. As expected, the lowest rate of pCR was observed in Luminal A (3 %) followed by Luminal B/HER2− (16 %). However, the response rate was significantly different between the two subtypes (p = 0.01), and both of them showed a significant difference of response compared with the other subtypes (all of p < 0.01). The highest rates of pCR were achieved in the HER2+ (58 %) and in triple-negative (43 %) subtypes. On the other hand, the Luminal A cancers showed the highest percentage of pNR (31.3 %) (Table 3).

Table 3 Response distribution (a) and statistical correlations (b) in different molecular subtypes

Histological grade, radiological tumor diameter pre-CT, necrosis, and mitotic count did not show a significant difference between different response categories (all p > 0.05). The presence of a lobular histotype varied among the different molecular subtypes (15 % in Luminal A; 8.5 % in Luminal B/HER2−; 2.2 % in Luminal B/HER2+; 5.3 % in HER2, and 1.8 % in triple negative) with the highest frequency in the Luminal subtypes. However, the lobular histotype (26 % vs 5 %) significantly differed between the pNR and pCR + pPR tumors (p < 0.00) only in the Luminal B/HER2− cancers (sensitivity: 26 %; specificity: 95 %; PPV: 48 %; NPV: 88 %) (Fig. 3a), although this histotype was more frequent in Luminal A. The absence of TILs was also distributed differently between the pNR and pCR + pPR categories (69 vs 23 %) in the Luminal B/HER2− breast cancers (p < 0.000) (sensitivity: 69 %; specificity: 77 %; PPV: 28 %; NPV: 95 %) (Fig. 3b), and st-TILs % also resulted differentially distributed in pNR and pCR + pPR categories (mean values: 3 vs 10 %; SDs: 5.73 vs 17.23 %) (p = 0.002) in this molecular subtype (Supplementary Fig. 1b). A TILs % ≤1 was significantly related with pNR (p = 0.002) (sensitivity: 81 %; specificity: 61 %; PPV: 22 %; NPV: 96 %) (Supplementary Fig. 1d).

Fig. 3
figure 3

Histograms showing the distribution of response rates for the histotype (a) and the inflammation (b). TILs tumor infiltrating lymphocytes

The distribution and cut-off of the percentage of Ki67 were not analyzed in the Luminal subtypes as proliferation is one of the parameters used to define the Luminal A and B categories.

In HER2 + and in triple-negative carcinomas, neither the distribution nor the cut-off value of Ki67 of 18 % were able to discriminate between the different response categories (p > 0.05).

Multivariable analyses

We performed a logistic binary regression on both the entire cohort and the Luminal B/HER2− carcinomas sub vs group. When two different variables of the same parameter (e.g., PgR: score or cut-off; TILs: presence/absence or st-TILS score) were available, the most significant at univariate analysis was used for multivariable test.

In the entire cohort, the cut-off of 9 mitosis was statistically related to different response categories (p = 0.036); in particular, the Odds Ratio (OR) for patients with pNR was 3.3 times higher for carcinomas with ≤9 mitoses than for patients whose tumors had >9 mitoses. A trend for statistical significance was also observed for the histological type (lobular vs non-lobular) (p = 0.071), which provided an OR of 3.6 (Table 4a).

Table 4 Results of multivariable analyses of the entire cohort (a) and the Luminal B/HER2− (b) carcinoma subtype

Within the Luminal B/HER2− subgroup, the analysis was performed excluding the ER score, the PgR cut-off, and the HER2 score because these variables are used to define this category [1]. In Luminal B/HER2− tumors, the lobular histotype and the absence of inflammation were independent predictors of pNR (p = 0.024 and 0.020, respectively). The ORs for pNR were 12.5 times higher for carcinomas of lobular histological type than non-lobular cancers and 6.2 times higher for carcinomas with no evidence of TILs than cancers with any clearly detectable intra-tumoral or stromal lymphocyte infiltrates. The mitotic cut-off showed an OR of 2.7 in the Luminal B/HER2− subgroup; however, in the multivariable model, it did not reach statistical significance (p = 0.247) (Table 4b).

Discussion

With the goal of identifying patients for whom chemotherapy is unlikely to yield a beneficial effect, we analyzed the differences in the distribution of common histopathological features between non-responder patients and patients who show some response (either partial or complete) to NAC in a neoadjuvant cohort. As in previous reports, which focused on achieving a pCR as a primary endpoint, our univariate analysis demonstrated that ER and PgR positivity are associated with a lack of response, as was the lobular histological type and the absence of inflammation. However, for the multivariable analyses neither the expression of ER nor PgR were independent variables for discriminating the pNR category from the pCR + pPR category. As recently suggested by Delpech et al. [27] for pCR, our result supports the idea that pNR to NAC is more related to intrinsic tumor characteristics than to ER expression. Many studies have shown that the response to NAC is lower in terms of pCR in locally advanced lobular carcinomas than in invasive ductal carcinomas [3034]. The presence of inflammation has instead been proposed as a predictive factor of response to chemotherapy in breast cancer in general [1, 3538]. A recent study for harmonization of the evaluation of TILs recommends considering only the stromal inflammation [26]; however, the authors specify that “this recommendation is based on the methodology used in published phase III studies, implying that there is room for future refinement as evidence accumulates to show the validity of alternative parameters and/or methodologies that improve upon this practice.” In the present study, the correlation between pNR and the absence of TILs (both intra-tumoral and/or stromal) was slightly stronger (p < 0.000) than between pNR and st-TILs% ≤1 (p = 0.001). However, in the pNR category the number of cases with the absence of TILs (22 cases, 48.9 %) was lower than that with st-TILs% ≤1 (28 cases, 63.6 %). This may influence the multivariable analysis results, though it may also indicate a better selection of pNR cases.

By considering specific intrinsic subtypes, Loi et al. observed a significant association between the presence of TILs and a good prognosis in triple-negative and HER2 + breast cancers [39]. Recent evidence indicates that in triple-negative breast cancer, both stromal as well as intra-tumoral TILs are predictive of pathological response to platinum-based NAC [40]. In our study, we showed that the lobular histotype and the absence of TILs were associated with pNR in the entire cohort using univariate analysis. However, using multivariable analyses, these two parameters correlated with pNR only in the Luminal B/HER2− breast cancers. In other subtypes, it is possible that other factors rather than histological type and inflammation concur to the pNR, in particular in Luminal A breast carcinomas where the pNR was high and the lobular histotype was more frequent than in Luminal B/HER2−.

Luminal B is the most challenging molecular subtype of breast cancers in terms of treatment. By being more aggressive than Luminal A, Luminal B/HER2− breast cancers are generally treated with both endocrine therapy and chemotherapy, but this approach is not always effective [41]. Being able to recognize a priori which Luminal B/HER2− carcinomas will not respond to chemotherapy would help avoid the toxic adverse effects of the treatment and plan a more effective anti-hormone treatment.

When evaluating markers of proliferation, our data showed significant differences for both the distribution of the mitotic numbers and the percentage of Ki67 within different response categories using univariate analyses on the entire cohort of patients. The main limitation of our study is related to the analysis of core biopsies, which may not completely represent the heterogeneous expression of proliferation markers within a given tumor. Nevertheless, the measurement of Ki67 is poorly standardized across laboratories even on whole tissue sections [4244]. On the other hand, Lehr et al. [45] have recently shown that while the percentages of Ki67 (evaluated by MIB-1 immunostaining) were comparable in biopsies and resection specimens irrespective of the method of quantification, the mitotic count was significantly overestimated in resection specimens of invasive breast carcinomas most likely as a result of a delay of tissue fixation. Another critical issue may be the high standard deviation values for both mitoses and Ki67; however, with ROC curve analysis, we identified two cut-off values (9 for mitosis count and 18 % for Ki67 expression) that performed well in differentiating the pNR and pPR + pCR categories using univariate analyses. In addition, the mitotic count was the only independent variable in the entire cohort of patients but not in any of the molecular subtypes. This finding could stem from the low number of cases available for each category, and larger studies are warranted to investigate further the utility of mitotic counts within distinct molecular subgroups. The hypothesis that a low mitotic count would characterize non-responsive patients treated with taxanes was not supported by our results. This finding is consistent with studies that suggest that drugs damaging DNA or microtubules are active through non-mitotic mechanisms [19].

In conclusion, although the pNR category has been neglected in the literature, the non-responsive patients could merit further investigation by using genetic signatures, particularly in breast cancers that do have the option of chemotherapy, such as Luminal B/HER2− breast carcinomas.