Introduction

Definitive radiotherapy with or without concurrent systemic treatment is a standard organ-preserving treatment for locally advanced mucosal head and neck squamous cell carcinoma. Following conventional treatment, a proportion of patients still experience locoregional tumour recurrence causing significant morbidity or death. Further, in those who are cured, the treatment is associated with significant acute and late toxicities. Commonly used tumour stage (TNM), smoking history, and HPV status provide prognostic information but have failed to be useful as predictive markers in risk-adaptation studies [1,2,3]. A reliable quantitative biomarker that can predict outcomes is therefore needed to guide adapted intensification or de-intensification of treatment based on response.

Functional imaging can characterise tumour biology using quantitative measures that can act as an early surrogate marker for treatment response. Diffusion-weighted imaging MRI (DWI) and 18F-fluorodeoxyglucose-positron emission tomography (FDG-PET/CT) have the advantage of non-invasively monitoring the tumour changes via serial imaging during radiotherapy. Correlational data suggest that DWI and FDG-PET/CT provide complementary biological information [4, 5]. To date, most imaging biomarker studies in head and neck cancer have analysed pre-treatment images. Changes in functional imaging performed during treatment have been shown to be a better biomarker of treatment response compared to pre-treatment imaging [6,7,8,9,10]. Studies analysing mid-treatment images have been limited by utilising single imaging modality, small sample size, and non-uniform methodologies. Timing of mid-treatment imaging also significantly impacts response assessment but the optimal time point for assessing response also remains unanswered.

Therefore, there remains a need for a prospective multimodality imaging study to assess the utility of quantifying mid-treatment tumour changes that predict response to radiotherapy in head and neck cancer.

The aim of this study was to evaluate if tumour changes measured using a combination of DWI and FDG-PET/CT performed serially during radiotherapy in head and neck cancer can be used to predict for treatment response. We also sought to find the optimal time point for assessing DWI in future imaging biomarker studies.

Materials and methods

Study design

Patients with newly diagnosed, biopsy-proven, non-metastatic mucosal head and neck squamous cell carcinoma treated with definitive radiotherapy with or without concurrent systemic therapy from two independent prospective quantitative imaging biomarker studies were evaluated. Patients in study 1 were recruited from June 2014 to May 2015. Patients in the subsequent study 2 were recruited from May 2016 to October 2019 [11]. Local research ethics committee approval was provided for both studies and all patients provided written informed consent.

All patients were evaluated and reviewed by a multidisciplinary head and neck team. Patients were treated using a simultaneous integrated boost intensity-modulated radiation therapy technique over 35 fractions. Radiotherapy treatment volumes were defined using consensus international guidelines and underwent a stringent peer review process [12].

Patients in both studies underwent serial multiparametric MRI (DWI, T2) imaging before, during (weeks 2, 3, 5, 6), and following completion (1 month and 3 months) of radiotherapy. Patients also underwent 18F-FDG-PET/CT before and during (week 3) radiotherapy. The imaging timeline utilised in both studies is shown in Fig. 1. Treatment response was evaluated with post-treatment 18F-FDG-PET/CT and clinical examination including nasoendoscopy. Recurrences were confirmed histologically or via imaging following discussion at a multidisciplinary head and neck meeting.

Fig. 1
figure 1

Imaging timeline and trial schema

The imaging results were compared with the 1-year local (primary tumour) recurrence measured from time of diagnosis. Secondary outcomes included 1-year regional (nodal) recurrence, distant (non-regional nodes and visceral metastasis) recurrence, and death.

Image acquisition

MRI was performed on a dedicated 3.0-T scanner (MAGNETOM® Skyra; Siemens Healthcare). Sequences obtained were DWI, using a readout-segmented EPI technique (RESOLVE; Siemens Healthineers), using b = 50 and 800 s mm−2 with signal averages set to 1 and 3, respectively. Morphological axial T2-weighted and T1-weighted images were also acquired. Details of the imaging technique have been described previously [11].

PET studies were acquired in radiotherapy treatment position on a GE Discovery™-710 time-of-flight positron emission tomography (PET)-CT (GE Healthcare, Waukesha, MI). FDG-PET/CT imaging was performed in accordance with EANM clinical guidelines [13]. Patients received 4.1 MBq kg−1 of [18F]FDG after at least 4 h of fasting. The average blood sugar level was 5.7 ± 1.2 mmol/L (range 3.3–9.6 mmol/L). The staging and all sequential scans were performed on the same scanner with the same acquisition and reconstruction protocols. Details of the imaging technique have been described previously [6].

Image analysis

The primary region of interest (ROI) captured was the primary tumour. To aid accurate delineation of tumour, clinical examination and nasoendoscopic findings were provided in conjunction with patient images. All PET and MRI images were viewed and had ROIs delineated using commercial image visualisation and delineation software (MIM Software Inc.). DICOM images containing ROIs were subsequently analysed using open-source PyRadiomics software (v2.2.0) [14].

The tumour volumes were delineated by a radiation oncologist (Y.T.) in consensus with a consultant radiologist (D.M.) and nuclear medicine physician (P.L.) for each imaging modality at every time point. Due to spatially non-linear distortions associated with DWI, ROIs were manually defined on ADCmaps while primarily referencing to the co-registered low b-value diffusion-weighted images (b50) as per consensus recommendations and methodology performed previously [10, 15]. Co-registered T2-weighted images were also viewed at the same time to aid delineation. The mean values of apparent diffusion coefficient (ADCmean) were calculated from all ROIs. FDG-PET-derived metabolic tumour volume (MTV) was calculated using the PETedge tool of MIM software, a semi-automated gradient method. Other PET parameters including SUVmax, SUVmean, and total lesion glycolysis (TLG = SUVmean × MTV) were also calculated for all ROIs. We have previously found anatomical tumour volumes on T2 MRI and CT images and their relative change were not correlated to clinical outcomes; hence, they were not included in this analysis.

The percentage change (∆) in ADC and FDG-PET/CT imaging parameters from baseline was calculated for each time point, defined as ∆ = (value − baseline) / baseline × 100%.

Statistical analysis

The changes in PET and DWI parameters from baseline were compared using the Wilcoxon signed-rank test.

The absolute value and change (D) in parameters were compared to treatment outcomes using the Mann-Whitney U test. For parameters with predictive value, receiver operating characteristic (ROC) analysis was performed using the area under the curve (AUC) as an index of accuracy to differentiate between multiple predictive parameters. Optimal cut-off values for analysis were derived from the ROC curves aiming for best sensitivity and specificity by applying the Youden index [16]. Local recurrence-free survival (LRFS), regional recurrence-free survival (RFS), distant metastatic recurrence-free survival (DMRFS), and overall survival (OS) curves were estimated using Kaplan-Meier analysis and compared using the log-rank (Mantel-Cox) test.

Combined modality response

To determine the utility of combining multiple imaging modalities, the most accurate FDG-PET/CT parameter and DWI parameter that correlated to local recurrence (highest ROC value in AUC analysis) were chosen for subsequent analysis. Optimum cut-off values from the two parameters were used to determine patient’s response in each imaging modality. Patients were subsequently divided into three groups based on those that had favourable response in both FDG-PET/CT and DWI imaging; mixed response (good PET and poor DWI, or poor PET and good DWI); or poor response in both imaging modalities (for a representative example of three groups, see Fig. 4). The three groups were compared to local recurrence status using the Pearson chi-squared test, using Cramer’s V test to check the strength of association to determine the utility of combining multiple parameters in improving predictive ability.

A pre-specified split-sample internal validation using the two independent enrolling studies was undertaken to explore the stability of results in combining multimodality parameters (described in detail in Supplementary Material).

The data were analysed using SPSS statistical software (version 24.0; IBM Corp). Statistical significance was considered p < 0.05.

Results

Patient characteristics

Fifty-five patients were available for analysis: 30 patients in study 1 and 25 patients in study 2. Patient, tumour, and treatment details are summarised in Table 1. The median follow-up was 34.0 months (range 4–68), with a minimum follow-up of 12 months for patients who were alive.

Table 1 Patient characteristics

The 1-year local, regional, and distant recurrence rates were 18.2% (10/55), 7.3% (4/55), and 12.7% (7/55), respectively. At the time of analysis, 18 patients had died, 13 from recurrent cancer and 5 from other causes.

Primary tumour analysis

The number of DWI tumour ROIs analysed at individual time points were 55(baseline), 47 (week 2), 42 (week 3), 37 (week 5), 34 (week 6), 36 (11 weeks), and 38 (19 weeks). The number of FDG-PET/CT tumour ROIs analysed at individual time points were 55 (baseline) and 53 (week 3). The mean of primary tumour ADCmean increased during radiotherapy and plateaued at week 5 (see Fig. 2A). For the entire population, the mean FDG-PET/CT parameters reduced at week 3 compared to baseline (see Fig. 3A). There was a statistically significant difference in the absolute ADCmean and FDG-PET/CT parameter values at each time point relative to baseline.

Fig. 2
figure 2

Primary tumour ADCmean values before, during, and following completion of radiotherapy. Absolute values (A). Percentage change relative to baseline stratified by local recurrence status (B)

Fig. 3
figure 3

Primary tumour FDG-PET-derived parameter values before and week 3 during radiotherapy. Absolute values (A). Percentage change in values relative to baseline stratified by local recurrence status (B).

Correlation of primary tumour parameters to local recurrence

Patients with local recurrence had a higher ADCmean value at baseline compared to those without. However, this did not reach statistical significance (p = 0.135). Absolute values of ADCmean at all time points did not correlate to local recurrence. Patients with local recurrence had a lower rise in ADCmean value over time (Supplementary Figure 1). The relative change in ADCmean (∆ADCmean) at all time points except week 5 correlated to local recurrence (see Fig. 2B). The differences in the mean of absolute ADCmean values and the ∆ADCmean based on local recurrence status for the entire group are provided in Table 2. Relative change in ADCmean at week 3 (∆Week 3 ADCmean) was the best predictor of local recurrence (AUC 0.825, p = 0.003) (Supplementary Figure 2). Optimal cut-off values of ∆Week 3 ADCmean for predicting local recurrence was < 24.4% rise in ADCmean, resulting in 89% sensitivity, 88% specificity, PPV 97%, and NPV 67%.

Table 2 Comparison of primary tumour DWI and FDG-PET parameters between patients with local recurrence vs nil local recurrence at 1 year

No absolute values of SUVmax, SUVmean, MTV, and TLG at baseline correlated to local recurrence. ∆MTV (p = 0.001) and closely related ∆TLG (p = 0.005) correlated to local recurrence (see Fig. 3B). Difference in absolute values and relative change in FDG-PET/CT parameters based on local recurrence status are provided in Table 2. Relative change in MTV at week 3 (∆MTV) was the best predictor of local recurrence (AUC 0.833, p = 0.001). Optimal cut-off of ∆MTV for predicting local recurrence was < 50.4% drop in MTV; resulting in 91% sensitivity, 64% specificity, PPV 64%, and NPV 96%.

To test the primary hypothesis of the added value of combining DWI and FDG-PET/CT parameters, ∆ADCmean and ∆MTV at week 3 were chosen to define response in DWI and FDG-PET/CT imaging. Using optimal cut-off value of ∆Week 3 ADCmean (> 24.4%) and ∆MTV (< 50.4%) to define response in each imaging modality, 20 patients were defined as favourable responders (> 24.4% ∆Week 3 ADCmean, and > 50.4% ∆MTV), 9 patients were defined as unfavourable responders (< 24.4% ∆Week 3 ADCmean, and <50.4% ∆MTV), and 12 patients were defined as mixed responders. In favourable responders, no patients (0/20) had local recurrence. In mixed responders, 17% (2/12) had local recurrence. In unfavourable responders, 78% (7/9) had local recurrence (for a representative example of three groups, see Fig. 4). The Kaplan-Meier analysis of local recurrence stratified by the above three response groups (subgroups) is shown in Fig. 5. Using a combination of ∆ADCmean and ∆MTV improved the strength of correlation to local recurrence (Cramer’s V 0.736, p ≤ 0.001, Pearson chi-squared test) compared to using individual parameter alone (Cramer’s V 0.693, 0.586).

Fig. 4
figure 4

Example of patients divided into three groups (favourable, mixed, and unfavourable) based on their primary tumour response in week 3 DWI and FDG-PET imaging–derived parameters (ADCmean; metabolic tumour volume, MTV)

Fig. 5
figure 5

Kaplan-Meier curve for local control stratified by response in primary tumour ADCmean (> 24.4%) and FDG-PET-derived MTV (> 50.4%) at week 3 during radiotherapy

As a part of pre-specified split sample internal validation, ∆Week 3 ADCmean and ∆MTV were selected for analysis in patients in study 1 as a training dataset (n = 30). Successful internal validation was performed on the testing dataset of patients in study 2 (n = 25) confirming use of combined primary tumour DWI and FDG-PET/CT parameters in predicting 1-year local recurrence (see Supplementary Material for details).

Correlation of primary tumour parameters to regional recurrence, distant recurrence, and death

Only baseline ADCmean value correlated to death (AUC 0.685, p = 0.027) (Supplementary Table 1). Using an optimal cut-off value of 1301 × 10−6mm2s−1 for baseline primary tumour ADCmean resulted in 2-year OS of 82% vs 65% (log rank p = 0.004) (see Supplementary Figure 3).

Baseline, week 3, and relative change in MTV and TLG at week 3 (∆MTV, ∆TLG) correlated to death, which occurred as a result of local tumour recurrence in majority of cases (Supplementary Table 1). Absolute value of TLG at week 3 was the best predictor for death (AUC 0.742, p = 0.005). Using an optimal cut-off value of 16.0 g resulted in 2-year OS of 96% vs 45%, p ≤ 0.001) (see Supplementary Figure 4).

Discussion

Our study represents the largest published prospective mid-treatment imaging biomarker series utilising DWI and FDG-PET/CT scans performed during radiotherapy in head and neck squamous cell carcinoma (HNSCC). We identified functional changes in tumour during definitive radiotherapy in HNSCC using FDG-PET/CT and DWI which can be used to predict treatment response. Using a novel method, our data suggest that combination of multiple functional imaging parameters improves correlation to local tumour control and could potentially be used to identify sub-groups of patients for future risk-adapted radiotherapy clinical trials. We also identified week 3 mid-treatment as an optimal time point to perform DWI imaging to assess primary tumour response in future imaging studies.

We showed that primary tumour ADCmean increased and plateaued around week 5 during radiotherapy. Change in week 3 primary tumour ADCmean had the strongest correlation to local control. Based on our data, greater percentage change in ADCmean (∆ > 24.4%) was highly predictive of good treatment response leading to local tumour control following radiotherapy. Studies by Motaba et al and Ghany et al also found ∆ADCmean at week 3 of 24% and 20% respectively correlated to tumour control [18, 19]. The higher rise in ADCmean during radiotherapy in good responders is explained by greater tumour cell loss. However, as radiotherapy progresses, increasing treatment-related oedema would also alter the ADC signal [17]. Week 3 likely represents the optimal time point for assessing treatment response due to higher ratio of signal from tumour prior to onset of noise from radiotherapy-related oedema [18]. Previous studies looking to identify the optimal time point for DWI response assessment were limited to 2 to 3 mid-treatment imaging time points [17,18,19,20,21]. The size of our study and extensive serial time points for DWI treatment response assessment adds considerably to the current published literature.

FDG-PET/CT imaging measures functional aspects of the tumour that are known to predict for poor radiotherapy response such as high tumour cellularity, metabolism, and/or hypoxia [19,20,21]. We have shown that change in primary tumour MTV at week 3 had the strongest correlation to local tumour control. Previous studies also show that FDG-PET/CT-based functional tumour volume (MTV) outperforms anatomical data for treatment response prediction [22, 23]. Based on our data, a greater percentage decrease in MTV (∆ > 50.4%) in primary tumour was highly predictive of tumour control. Our results confirm findings from previous studies that changes in FDG-PET/CT metabolic parameters are better treatment response markers compared to absolute values [22, 24]. Studies by Myo et al and Pollom et al have shown that ∆TLG from mid-treatment FDG-PET/CT can predict clinical outcomes [6, 28]. The studies utilised differing methodologies to delineate the MTV and hence limits direct comparison. Aside from non-standardised imaging protocol, performance of FDG-PET/CT-based parameters can also be limited by confounding influence of radiotherapy-induced inflammation/necrosis [24]. Hence, a rationale exists for utilisation of concomitant functional MRI imaging to provide additional discriminatory information to improve the prediction of treatment response.

We showed that utilisation of multiple functional imaging modalities improves incremental predictive performance for assessing tumour response following radiotherapy compared to individual imaging modality. FDG-PET/CT parameter provided good sensitivity but only moderate specificity due to treatment-related changes. The addition of DWI parameter improved the specificity and hence the predictive performance. The majority of published imaging biomarker studies have focused on a single imaging modality, predominantly CT, fMRI, and to a lesser extent FDG-PET and 18F-MISO PET. Multimodality functional imaging offers the advantage of measuring different tumour characteristics and improving the performance of individual imaging modalities in predicting treatment response [4, 5, 25, 26]. Wong et al have published the only other study containing 35 patients correlating changes in MRI and FDG-PET/CT imaging parameters during radiotherapy to clinical outcomes in HNSCC [10]. They found individual fMRI and FDG-PET/CT features that successfully differentiated responders from non-responders based on a surrogate endpoint of 3-month post-treatment imaging response.

Using optimal cut-off values of the highest performing FDG-PET/CT (week 3 ∆MTV) and DWI (week 3 ∆ADCmean) parameter, we were able to separate HNSCC patients into three risk sub-groups based on their early response during radiotherapy. This is potentially more flexible and accurate than traditional staging with AJCC as it accounts for tumour biology and treatment sensitivity. This stratification could be utilised in future mid-treatment risk-adaptation clinical trials. Patients with good FDG-PET/CT and DWI response had low rate of local failure (0%) and could be considered for de-escalation, e.g. radiotherapy dose or volume reduction, to reduce treatment-related toxicities. Patients with poor FDG-PET/CT and DWI response had high rates of local failure (78%) and could be considered for treatment intensification, e.g. radiotherapy dose boost, acceleration, use of radiosensitisers, or bail-out surgery. Our approach of risk stratifying patients is novel and hypothesis generating requiring external validation prior to clinical application. There are currently no published HNSCC studies that have utilised mid-treatment week 3 DWI and FDG-PET/CT imaging to allow external validation. Appreciating the limitation of the data available, a pre-specified split sample internal validation using patients from the two temporally separate studies was successfully undertaken to explore the stability of our results.

Strengths of our study include the large number of patients and standardised imaging time points for response assessment. Patients in our study were prospectively recruited, underwent standardised imaging protocol, and had reproducible consensus-based ROI delineation using best available guidelines [6, 10, 11, 14, 15, 27, 28]. We also correlated imaging parameters with long-term clinical outcomes rather than surrogate endpoints such as early post-treatment imaging. There are a few limitations of our study. Ours was a single-institutional study with all serial imaging undertaken on the same dedicated MRI and PET machine, and hence extrapolation of our results without additional standardisation quality assurance activities or utilisation of our methodology should be done with caution [29, 30]. We also noted attrition of patients undergoing MRI imaging as radiotherapy progressed due to poor patient tolerability; however, ∆Week 3 ADCmean remained significant for assessing treatment response when analysing only patients who underwent serial DWI imaging at every time point. Patients included in our series represent the commonly seen heterogeneous sub-sites of mucosal head and neck cancer treated with radiotherapy. Therefore, our results should be replicated in a larger multi-institutional setting with a more homogeneous population prior to clinical application. In a multi-institutional setting, additional quality assurance to harmonise image quality is also required. However, we believe that parameters measuring relative change from baseline if done on the same scanner provides greater reproducibility due to auto-normalisation of measurements compared to using a single static value.

Conclusion

Our study highlights the importance of changes in mid-treatment DWI and FDG-PET/CT imaging in predicting treatment response. We identified a combination of change in week 3 DWI (> 24.4% ∆ADCmean) and FDG-PET/CT (> 50.4% ∆MTV) imaging parameters performed during radiotherapy that could be utilised in the design of future risk-adapted clinical trials in HNSCC.