Introduction

There is an urgent need for breast cancer early detection biomarkers given that none are currently available and given the considerable public health importance of breast cancer. At present, the best available tool for the early detection of breast cancer is mammography. Randomized trials have shown that annual or biennial mammography reduces mortality 30% among women 50–69 years of age [1]. However, mammography is limited by its 75–90% sensitivity and 90–95% specificity, with optimal performance only among 50- to 69-year-olds. For example, its positive predictive value (proportion of those with a positive test who have the disease) is 60–80% among 50- to 69 year-olds, but is only 20% among those <50 years of age. Despite improvements in technology and the widespread use of mammography, breast cancer still remains the second leading cause of cancer mortality in women in the USA [2] and the leading cause of cancer mortality in women worldwide [3]. There are several reasons for this, including the sensitivity and specificity of mammography, lack of uniform access to screening services, and the variable utility of mammography to detect different types of tumors. It is well established that mammography is better able to detect certain types of breast cancer (such as ductal carcinomas) than other types (such as poor prognosis estrogen receptor (ER)-negative tumors) [49]. Considering ER status, interval-detected tumors are 1.8- to 2.6-fold more likely to be ER compared to screen-detected tumors [7, 9].

Continued improvements in our ability to detect breast cancer early offer the promise of further reducing the burden of this disease as breast cancer detected at an earlier stage is much more curable than is metastatic disease. The current 5-year survival rate for localized breast cancer in the USA is 98%, but is only 27% for metastatic disease [2]. Thus, there is a great public health need for improvements in our abilities to detect breast cancers of all types earlier, and proteomics is a means for identifying early detection biomarkers in plasma.

We recently completed a unique large-scale study aimed at discovering breast cancer early detection biomarkers. Unlike the vast majority of biomarker discovery studies, this case–control study nested in the Women’s Health Initiative Observational Study [10, 11] used preclinical samples where the plasma specimens assessed were all obtained 0–17 months prior to diagnosis among breast cancer cases. The proteomic discovery platform used involved applying mass spectrometry to extensively fractionated case and control plasma pools using an approach that has been previously described [12]. Though low-throughput, a major advantage of this approach is that it allows the plasma proteome to be interrogated in considerable depth, which is important given the enormous dynamic range of plasma protein concentrations. In addition, protein concentrations can be quantitated and proteins can be identified by searching mass spectra against the human International Protein Index database (version 3.13). Given the heterogeneity of breast cancer and the likelihood that breast cancer early detection biomarkers may be specific to certain breast cancer types, we evaluated markers specific to breast cancer overall and to several breast cancer subtypes, including: estrogen receptor-positive (ER+), triple negative (TN, ER/PR/HER2), and HER2-overexpressing (H2E, ER/HER2+) breast cancer.

We have completed and published an initial set of validation experiments on a small subset of promising candidates specific to ER+ disease that have an available ELISA assay [13]. Thus far, we have only attempted to validate markers specific for ER+ disease because of the rarity of TN and H2E breast cancers and the lack of a sufficiently sized validation set for either tumor subtype. Very few of our ER+ candidates had an available ELISA, including none of our top-ranked candidates. Nevertheless, we proceeded with the validation of the seven candidates specific for ER+ breast cancer with an available ELISA assay since a completely independent validation set of ER+ cases and matched controls was available to us. The results of this study have been published and are summarized below [13].

Results

The workflow of this study is summarized in Fig. 1 and included a discovery phase on pooled samples and two rounds of validation on independent sets of individual samples. The discovery phase involved 14 quantitative proteomic experiments. In each experiment, a pooled set of plasma from 35 breast cancer cases was compared to plasma pooled from 35 matched cancer-free controls. Case pools were made uniform with respect to ER status, PR status, and histology. Twelve of these experiments were limited to ER+ breast cancer and two were limited to ER disease. Cases and controls were ascertained from the WHI OS, and the characteristics of cases and controls in our discovery and independent validation sets are shown in Table 1. Controls were matched to cases on age, race/ethnicity, study blood draw date (± 1 year), and clinical center of enrollment. Briefly, cases were somewhat more likely to be current users of combined estrogen plus progestin (E+P) menopausal hormone therapy and to be overweight/obese (body mass index ≥ 25.0 kg/m2) and less likely to have had a hysterectomy compared to controls

Fig. 1
figure 1

Study design

Table 1 Characteristics of breast cancer cases and controls used for biomarker discovery and validation

We quantified a total of 503 proteins in our discovery experiments on ER+ breast cancer and 57 met predefined statistical criteria of having a fold change ≥1.15, a p value <0.10, and were quantified in at least 2 of our 12 experiments. More stringent statistical criteria were not applied to our discovery data in order to make our candidate lists more inclusive. This is because we planned subsequent rigorous validation experiments to identify false positives, but we did not want to discard potential true positive candidates that more stringent criteria would have excluded. Of these 57 candidates, seven had a commercially available ELISA assay and thus could be validated in a straightforward manner. These seven candidates were: epidermal growth factor receptor (EGFR), fibronectin 1 (FN1), insulin growth factor binding protein 1 (IGFBP1), lactotransferrin (LTF), protein NOV homolog (NOV), trefoil factor 3 (TFF3), and von Willebrand factor (VWF). So while discovery experiments were conducted on pooled samples, validation ELISA assays were performed on individual specimens.

Results from our discovery and first round of validation experiments on 105 cases and 105 controls not used in our discovery experiments are shown in Table 2. EGFR levels differentiated cases from controls [odds ratio (OR) = 1.68, p = 0.0017], but none of the other six proteins differed between cases and controls. However, for the most part, the magnitude and direction of the risk estimates were similar in the discovery and validation experiments for each marker. To further confirm this finding, we conducted a second round of validation on another independent set of 93 cases and 93 controls from the WHI OS and again confirmed that EGFR was elevated in cases compared to controls (Table 3). Combining the two validation sets, the OR for EGFR was 1.44, with a highly statistically significant p value of 0.0008.

Table 2 Results from the first round of ELISA-based validations on 105 case/control pairs
Table 3 First and second round validation results for EGFR

It has previously been shown that the use of menopausal hormone therapy impacts a significant portion of the serum proteome [12]. When we stratified our validation results according to the use of menopausal hormone therapy (never/former users, current unopposed estrogen users, and current E+P users), there was evidence that EGFR only differentiated cases from controls among women who were current E+P users (Table 3). Specifically, the OR among current E+P users was 2.41 (p = 0.0001), but among never/former users of hormone therapy and current users, unopposed estrogen EGFR was not associated with risk (OR = 1.05, p = 0.78; OR = 1.40, p = 0.13, respectively). The p value comparing the odds ratios among current E+P users to never/former users was 0.0019, and the p value comparing current E+P users to current E users was 0.12.

We also assessed risk according to EGFR quartile. Across all cases and controls, women in the highest EGFR quartile had a 2.90-fold (p = 0.005) increased risk of developing breast cancer compared to those in the lowest quartile. This risk was substantially higher among current E+P users, where those in the highest EGFR quartile had a 9.04-fold (p = 0.0004) increased risk (Table 4). The receiver operator curve for EGFR among current E+P users has an area under the curve of 0.7. At 80% specificity, EGFR’s sensitivity as a single marker is 56%, and at 90% specificity, its sensitivity is 31% (Fig. 2).

Table 4 Quartile distributions of EGFR validation results among all case/control sets and among current estrogen and progestin users
Fig. 2
figure 2

EGFR receiver operator curve for cases versus control among all current E+P users. This figure was previously published in Pitteri et al. [13]

Discussion

We convincingly demonstrated in two separate validation sets completely independent from our discovery set that EGFR levels are elevated in preclinical plasma of E+P users who went on to be diagnosed with breast cancer compared to controls. In this setting, at 80% specificity, EGFR’s sensitivity was 56% and at 90% specificity the sensitivity was 31%. In comparison, PSA, which is clinically used to screen men for prostate cancer, has 40.5% sensitivity at 81.1% specificity and 20.5% sensitivity at 93.8% [14]. So while comparable in performance to PSA, EGFR cannot be viewed as a clinically useful breast cancer early detection biomarker on its own given that it only appears to be predictive among E+P users. It is important to note though that while this finding was statistically significant, due to the relatively small numbers of cases and controls who were E+P users, the 95% confidence intervals for our risk estimates were somewhat wide.

Nevertheless, our finding is still important in two respects. First, no prior studies have validated even a single breast cancer early detection biomarker in preclinical specimens to the degree we have here, validating EGFR in two completely independent validation sets. This suggests that detectable changes in the plasma proteome may indeed be present preclinically for diseases such as ER+ breast cancer, which are relatively small tumors compared to other cancers, such as ovary cancer, where biomarker discovery and validation has thus far been much more successful. Second, consideration of other exposures in biomarker discovery studies is likely critical given that while overall EGFR was not found to distinguish cases from controls among E+P users it is highly statistically significant. So consideration of factors like use of hormone therapy, which has been shown to have a major impact on the plasma proteome [12], is critical in this type of work. Future work aimed at discovering and validating breast cancer early detection biomarkers is needed, and our EGFR finding warrants further replication. The primary limitation of this study was the lack of readily available means to validate the numerous other candidates we discovered, most of which were much more compelling candidates based on their discovery odds ratios and p values. It is not surprising that the other six markers did not validate given their comparatively high FDRs (ranging from 0.66 to 1.00) compared to EGFR and to several of our other much higher ranked candidates.

With respect to the biology of EGFR, EGFR is a cell surface tyrosine kinase receptor and is a member of the ERBB protoncogene family which also includes HER2. EGFR is a transmembrane protein, and the peptides we identified by mass spectrometry in our discovery IPAS experiments were all located on the extracellular region, suggesting shedding of the extracellular domain by cells. Binding of EGFR by various ligands can result in increased uncontrolled proliferation of cancer cells, and EGFR is overexpressed in 20–81% of breast tumors [1517]. Several studies have measured blood levels of EGFR in relation to breast cancer, though overall, the results are quite inconsistent. It is difficult to directly compare the results of these studies to ours since none involved measurements of EGFR in the preclinical period prior to a breast cancer patient’s diagnosis. EGFR levels have been reported to be higher in normal individuals than patients with primary breast cancer [18] and metastatic breast cancer [19, 20], with another finding no differences in EGFR levels between patients with metastatic breast cancer patients and healthy women [21]. Again, none of these reports used preclinical specimens from breast cancer patients, so the differences observed could be the result of factors related to the diagnosis and treatment of breast cancer. With respect to survival, some studies suggest that lower levels of EGFR in patients with metastatic breast cancer are associated with shorter overall survival [20], particularly in patients with ER+ tumors [19], whereas others have found no association between EGFR serum levels and overall [22, 23] or progression-free survival [22]. Among women with hormone receptor-positive disease, serum EGFR levels decreased significantly after 1 and 3 months of letrozole therapy versus pretreatment conditions [21]. Given the inconsistency across studies and several critical differences in their respective designs, the role of serum/plasma levels of EGFR with respect to breast cancer is quite unclear. While EGFR is involved in hormonal pathways relevant to breast cancer, there is no clear explanation at this point for why EGFR may be useful for the early detection of breast cancer only among E+P users, but not among either E users or never/former users. Thus, further investigations of EGFR as a potential breast cancer early detection biomarker are warranted.

Conclusions

Our study demonstrates that there may indeed changes in the plasma proteome prior to the clinical diagnosis of breast cancer that are detectable and of potential clinical utility. Ongoing efforts to follow up on promising candidates and formally validate them are warranted as confirmed breast cancer biomarkers could have several important clinical uses. These include as a companion to regular mammographic screening either to inform decision making regarding timing of subsequent screening (early recall in 6 months, next mammogram in 1 year, or next mammogram in 2 years) or to aid in the detection of cancers missed by mammography.