Introduction

Treatment for metastatic breast cancer aims to prolong survival and palliate symptoms [1]. Considering that metastatic breast cancer remains incurable, maintaining quality of life (QOL) is an important therapeutic goal [2]. Instruments to measure QOL in patients with metastatic breast cancer include the European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire core 30 (EORTC QLQ-C30). Despite improvements in validation of the QLQ-C30, interpreting the numerical scores remains challenging.

A large clinical trial can show statistically significant differences for QOL scores, but the clinical relevance of these differences remains controversial. The concept of minimal important difference (MID) provides a measure of the smallest difference in QOL scores that patients can perceive as improvement or deterioration [3]. King has comprehensively reviewed the literature on this topic [4]. While both anchor- and distribution-based methods are available for estimating MIDs, the anchor-based method has the appeal of incorporating external instruments (criterion or anchor) that are relevant to patients [5].

Recommendations [6] or guidelines across cancer sites using meta-analysis [7, 8] have been provided to interpret QLQ-C30. These are useful for designing trials and interpretation, but MIDs can vary according to factors such as patient populations, cancer site, and disease stage. Research into MIDs for metastatic breast cancer remains limited. Musoro et al. recently used two published trials to estimate MIDs relying on clinical anchors from clinician examinations [9]. In contrast, the most widely used anchor is the rating of change in health status as obtained directly from patients [10].

The purpose of this study was to determine the MIDs of the QLQ-C30 in a population of metastatic breast cancer patients using data from the SELECT BC-CONFIRM trial. In that trial, transition items reported directly from patients and other clinical anchors were collected at multiple timepoints from the initiation of treatment; these trial data were thus considered desirable for the estimation of MID.

Methods

Patient population

The dataset for this study comes from the SELECT BC-CONFIRM randomized clinical trial, which enrolled and randomized 230 patients from multiple centers in Japan [11]. The trial aimed to confirm that S-1, an oral fluorouracil antitumor drug, is non-inferior to anthracycline-containing regimens as first-line chemotherapy for HER2-negative metastatic breast cancer. Patients were randomized to receive oral S-1 or intravenous anthracycline (doxorubicin/cyclophosphamide or epirubicin/cyclophosphamide) at a standard dose. The main inclusion criteria for the trial were as follows: presence of HER2-negative metastatic breast cancer; endocrine therapy-resistant status; and no previous administration of chemotherapy for advanced disease. The primary endpoint in that trial was overall survival, and QOL was included as a secondary endpoint. The research ethics boards of all participating institutions approved this study, and all participants provided written informed consent prior to enrollment. The trial was registered with the University Hospital Medical Information Network, Japan (UMIN000005449).

The EORTC QLQ-C30

The EORTC QLQ-C30 is a general QOL instrument for cancer patients [12]. This questionnaire comprises 30 items, 24 of which are aggregated into five functional scales (physical, role, emotional, cognitive, and social), three symptom scales (fatigue, pain, and nausea/vomiting), and one global health status. The remaining six items assess additional symptoms (dyspnea, appetite loss, insomnia, constipation, and diarrhea) and financial impact. The SELECT BC-CONFIRM trial used the Japanese version of the QLQ-C30 questionnaire (version 3) [13], with each scale/item converted into a score ranging from 0 to 100. A high score in the functional scale represents increased functional ability, whereas a high score in the symptom scale represents increased distress. Researchers administered the QLQ-C30 before starting the protocol treatment and every 2 months thereafter until 1 year after starting treatment.

Anchors

We selected two anchors: the patient’s global rating of transition; and change in Eastern Cooperative Oncology Group performance status (PS) [14]. The transition items comprised three questions: “Compared with when you started the chemotherapy in this trial: (a) How has the time you spend in bed or a chair during the day changed? (a question of function); (b) How have your concerns changed? (a question of concerns); and (c) How has your global health status changed? (a question of global health)”. We created the transition items for this study by referring to the item representativeness of each functional scale of the QLQ-C30 using factor analysis [15] of a previous clinical trial in Japan [16]. For each question, seven response options were provided, such as “much shorter”, “shorter”, “a little shorter”, “unchanged”, “a little longer”, “longer”, or “much longer” for the functional question. We devised the 7-grade responses based on the previous study [6]. In the trial, researchers administered the transition questionnaire with the QLQ-C30 every 2 months until 6 months after starting the protocol treatment. Researchers scored PS between 0 (fully active) to 4 (completely disabled) at every course of the treatment.

Three status groups were defined for transition items. At each timepoint, patients who made a response adjacent to “unchanged” were categorized as deterioration or improvement. Patients who responded two categories away from “unchanged” were excluded from analysis for estimation of MIDs, as these responses were considered beyond a “minimal” change. For example, patients who answered “a little longer”, “unchanged”, or “a little shorter” for the functional question were categorized as showing “deterioration”, “stable (no change)”, and “improvement”, respectively, and all other responses were excluded from the analysis of that timepoint. Similarly, at each timepoint, three change status groups were defined for the anchor of PS, calculated as the change in score from baseline: deterioration (worsened by one category of the anchor); stable (no change); or improvement (improved by one category of the anchor). Changes by two or more scores of PS were excluded from analysis, under the same rationale described above.

Statistical analysis

We summarized patient characteristics and demographics of the analyzed cohort. The variables included age, weight, number of lymph node metastases, estrogen receptor status, progesterone receptor status, history of surgery, history of chemotherapy use, and history of oral fluorouracil use.

We compared changes in QLQ-C30 scores with anchors measured at 2, 4, and 6 months from baseline. We calculated correlation coefficient r, and considered r >|0.3| as necessary to show a moderate association between change scores and anchors [17]. We calculated Pearson’s correlation, Spearman’s rank correlation, polyserial correlation, and polychoric correlation. We compared the 0.3-threshold and the maximum of the four correlation coefficients to explore candidate scales of QLQ-C30.

We used the mean change method to estimate MIDs for within-group change. The within-group MID can be calculated in several ways; some scholars calculate mean change in a (slightly) improved/deteriorated group [18], while other scholars adjust the results by subtracting the mean change that occurs in the stable group [5, 9, 19, 20]. As the latter method has been more frequently applied in cancer research, we calculated the mean difference in change scores between deterioration and stable groups to indicate within-group MIDs for deterioration. Similarly, the mean difference of change scores between improvement and stable groups indicates MIDs for improvement. We calculated effect sizes (ESs) as the mean difference in change scores divided by the standard deviation (SD) of the QOL scale at baseline. ESs determine the responsiveness of changes in QOL to changes in anchor categories. We considered that ESs ≥ 0.2 and < 0.8 were appropriate to include as MIDs because an ES < 0.2 is small, and an ES ≥ 0.8 is large [21]. We calculated within-group MIDs separately for each pair of the anchor and timepoint (i.e., at 2, 4, and 6 months). To reduce the possibility of false-positive results, we show the results of QOL with at least an estimated two MIDs across timepoints or anchors.

We used a generalized estimating equation (GEE) to estimate MIDs for between-group change; we fitted the GEE for a given QOL and anchor pair, the changes in QOL over multiple timepoints as outcome variables, and anchor category as dummy explanatory variables. In contrast to the mean change method, the GEE method stacks changes in QOL scores over the timepoints and treats the scores as a vector of outcome. In comparative clinical trials in which patients respond to the QOL questionnaire longitudinally, the GEE with changes in QOL over multiple timepoints as outcome variables and treatment variable as explanatory variables is often fitted, rather than the simple linear regression which only uses a change in QOL at a specific timepoint. The GEE method thus provides a useful MID for between-group change [9]. We also applied the threshold for ESs (≥ 0.2 and < 0.8).

We obtained final estimates of MIDs provided by anchor-based methods using the correlation-based weighted average to reflect the degree of closeness of each pair of the anchor and change in QOL.

We also conducted a distribution-based method for MIDs. We calculated the 0.2 SD, 0.3 SD, 0.5 SD, and standard error of measurement based on a test–retest reliability of [22] for each QOL at all timepoints.

We conducted all statistical analyses using Base SAS and SAS/STAT version 9.4 software of the SAS System for Windows (SAS Institute, Cary, NC, USA).

Results

In total, 230 patients were randomized in the trial. To calculate the change in QOL from baseline, we included only the 154 patients who had completed a baseline QLQ-C30. Table 1 summarizes the demographic and treatment characteristics of the analyzed cohort. Median age was about 60 years. The majority of patients had a history of surgery (76.6%), and about two-thirds of patients had no history of chemotherapy.

Table 1 Patient characteristics

Numbers of patients by anchor categories and corresponding mean change in QOL scores are shown in Supplementary Table 1. When using the transition item of global health, the proportions of “improvement”, “stable (no change)”, and “deterioration” were 17%, 38%, and 23%, respectively, with the remaining 22% responding two or more categories away from “stable (no change)”. When using the change in PS anchor, the proportions of “improvement”, “stable (no change)”, and “deterioration” were 6%, 78%, and 13%, respectively. Table 2 shows correlation coefficients between change in QLQ-C30 and anchors. For each anchor, correlated QOL scales (i.e., maximum correlation coefficient > 0.3) at one or more timepoints are listed. The performance status showed a correlation with change in QLQ-C30, mostly at 6 months. Supplementary Table 2 shows correlation coefficients for absolute or baseline scores with anchors. In most pairs of anchors and scales, correlations of transition items with change scores were greater than correlations of transition items with absolute scores. In most pairs of positively (negatively) correlated transition items and scales, transition items were negatively (positively) correlated with the score at baseline.

Table 2 Correlations between changes in quality of life scales and anchors at 2, 4, 6 months from randomization

Figure 1 shows the range of MIDs from the mean change method for each QOL scale across multiple anchors. The mean change method identified MIDs for improvement and deterioration in 8 of 15 QLQ-C30 scales, and the GEE identified MIDs for at least one direction in 7 of the 15 QLQ-C30 scales. Estimated MIDs differed between scales of QOL, direction (improvement and deterioration), and methods of estimation. Estimates from GEE tended to be smaller than those from the mean change method. No MIDs were estimated from GEE on some scales. Supplementary Table 3 shows details of results from Fig. 1. Most MIDs estimated with GEE were p < 0.05, but this was not the case for MIDs estimated under the mean change method.

Fig. 1
figure 1

Minimal important differences obtained using the mean change method and generalized estimating equations. Symptom scales were reversed to align with functional scales; therefore, positive values represent decreases in symptom scales. Absence of the line (e.g., GEE estimates for deterioration in PF) indicates that no minimal important difference estimate that met our criteria was obtained. GEE generalized estimating equation; QL global quality of life; PF physical function; RF role function; EF emotional function; SF social function; FA fatigue; PA pain; AP appetite loss

Table 3 summarizes MIDs from anchor-based methods, alongside results from the distribution-based method for comparison. Distribution-based estimates at 2 months are presented in Table 3 because results at other timepoints were similar, with at most a difference of 0.8 units for 0.2 SD. Most MIDs for interpretation of within-group differences were between 0.3 and 0.5 SD.

Table 3 Summary of minimal important differences from anchor- and distribution-based methods

Discussion

To the best of our knowledge, this study is the first to estimate MIDs of QLQ-C30 in a population of Japanese patients with metastatic breast cancer. The study identified MIDs for interpreting within-group differences in 8 scales and for interpreting between-group differences in 7 scales.

As far as we know, only one study has examined the MIDs of QLQ-C30 in patients with advanced breast cancer [9]. Musoro et al. used two published trials to identify MIDs [9], and most of the patients were from European countries. They identified at least one suitable MID in eight scales, and six of those eight scales (global quality of life, physical function, role function, social function, fatigue, and appetite loss) were in common with the present findings. In these scales, MIDs for interpreting between-group were similar. Our results thus support their conclusions, at least for these scales.

Conversely, MIDs for interpreting within-group deterioration tended to be larger in our study than in the previous study [9], such as for global quality of life (− 10 vs. − 8), role function (− 17 vs. − 6), social function (− 12 vs. − 7), and fatigue (− 10 vs. − 8). Feng et al. [23] investigated systematic differences between Japanese and European general populations in self-reported QOL questionnaires. They found that the percentage of Japanese respondents reporting no problems in morbidity, usual activities, pain, or anxiety was higher than that of European respondents [23]. Our study of Japanese patients revealed that minimally deteriorated patients described larger changes from baseline than the European patients reported by Musoro et al. [9]. These findings may indicate inter-country differences in the self-evaluation of health at a single point as well as changes between two timepoints, especially in terms of deterioration.

Consistent with recent findings on MIDs for other cancer sites [5, 19, 24, 25], the magnitudes of MIDs for deterioration and improvement differed, along with the scales of QLQ-C30. We observed no systematic differences between previous results and our findings. These estimates generally do not conflict with the previously proposed guidelines [7, 8], although we consider that MIDs for interpreting deteriorations and improvements should be distinguished. In addition, MIDs for interpreting within- and between-group changes differed, as shown in Table 3. These results strongly indicate that MIDs fitted for purpose are necessary.

The choice of an anchor is one of the critical components in the estimation of anchor-based MIDs [10]. Musoro et al. [9] estimated MIDs relying on clinician-examined anchors such as performance status and common terminology criteria for adverse events. However, patient-reported anchors appear preferable to clinician-reported anchors [10]. The present study used both types of anchor: transition items and performance status. We found that transition items were more sensitive to early changes in QOL than performance status, as shown in Table 2. This finding is consistent with findings from previous research that self-reports from patients are more sensitive than clinicians’ reports to underlying changes in health status, and tend to identify clinically meaningful symptoms earlier [26, 27]. MID estimates by mean change methods using performance status as the anchor lay within the range of MID estimates using transition items as the anchor (Supplementary Table 3). This finding may indicate that clinician-reported anchors provide similar results to patient-reported anchors for estimating MIDs. However, we recommend patient-reported anchors as more susceptible to changes in the underlying health status of patients than clinician-reported anchors. Note that one needs to be cautious about the use of transition items because they are often vulnerable to recall bias.

Distinguishing between unreliable and credible MIDs is essential for MIDs to help patients and clinicians in making decisions. One such method is referring to a recent guideline [10] and reporting the credibility level of the anchors used to estimate MID. We created Supplementary Table 2 for this purpose. We note here two points. First, the correlations of transition items with change score were greater than those with absolute score. Otherwise, the transition item seems to capture current health status more than the change in health status. Second, transition items showed some correlation not only to the change in QOL, but also to QOL at baseline. This correlation reflects the fact that patients appropriately respond to transition items while considering the health status at baseline. These points support the credibility of the transition items used in this study.

Several limitations need to be considered when interpreting the present results. First, due to the small or medium sample size for estimating MID, each estimate was not precise. We aggregated estimates into a single MID using correlation-weighted averages, so we consider our final estimates as reliable to some extent. Second, no change group or improved or deteriorated group showed statistically significant results in some scales. We do not believe, however, that this seriously influenced our conclusions, because all estimates of MIDs in this study exceeded effect size thresholds.

In conclusion, we provided the MIDs of QLQ-C30 using both patient- and clinician-reported anchors measured in a randomized trial of Japanese patients with metastatic breast cancer. Our results can aid patients and clinicians, as well as researchers, with interpreting QLQ-C30. Further studies should replicate this study in other populations or other anchors to validate the current MID estimates.