Introduction

There has been a significant increase in the incidence of HPV-associated oropharyngeal squamous cell carcinoma (OPSCC) over the past several decades1. Based on data from 2015 to 2019, about 47,199 new HPV-associated cancers occurred in the United States each year, accounting for about 70% of cancers of the oropharynx1. It has been shown that patients with HPV-associated OPSCC demonstrate improved treatment response to chemoradiation and prognosis compared to patients with HPV independent OPSCC, whose tumors are much more often associated with alcohol and tobacco use2. A distinct tumor, node and metastasis (TNM) staging system for HPV-associated OPSCC patients was established in the 8th edition of the American Joint Commission on Cancer (AJCC) staging system to account for the unique clinical characteristics3. However, recent studies report that the differentiation of outcome between the AJCC 8th stage groups remains less than satisfactory4,5. New biomarkers to improve AJCC 8th staging in risk stratification within the HPV-associated OPSCC population is sorely needed.

Previous studies have shown that using concurrent chemoradiation as definitive treatment confers lower local and regional failure rates compared with radiotherapy alone for high risk OPSCC. However, the addition of chemotherapy can cause increased treatment-related toxicity, a particular concern for patients with HPV-associated OPSCC who are at low risk for disease recurrence6,7,8. Skillington et al. compared the outcomes of 195 p16-positive, surgically managed OPSCC patients and found that receiving chemotherapy did not provide additional disease-free survival benefit and was associated with worse overall survival, potentially due to the lack of benefit from the additional chemotherapy in low-risk patients6. Furthermore, toxicity from the chemotherapy was also observed9,10,11. Significant deteriorations in swallowing outcome occurred in those who had chemotherapy in addition to radiotherapy9. To reduce treatment toxicity without compromising survival outcomes for HPV-associated OPSCC patients, recent clinical studies have focused on identifying patients suitable for treatment de-escalation with lower dose radiotherapy and also for potentially avoiding the use of chemotherapy without compromising survival outcome12,13. This highlights the need for developing a predictive biomarker to distinguish which low-risk OPSCC patients may not benefit from chemotherapy and thus for whom such toxic treatments could be avoided, without compromising patient outcomes.

With the advent of computerized image analysis and an increased attention paid to machine learning, there is an opportunity for deep mining of computer-extracted image features to inform patients’ outcome and to guide treatment intensities14,15,16,17. Radiomics, which seeks to identify subtle image-based attributes related to tumor phenotypes18,19 and prognosis20,21 is useful in not only identifying the presence of disease on radiographic scans but also in helping identify features that related to disease outcome and treatment response. Although a few studies have shown that machine learning based prognostic classifiers can predict molecular subtype18,19 and survival20,21 in head and neck cancer, there has been no methodology available to evaluate the predictive utilities of these approaches for chemotherapy benefit among HPV-associated OPSCC patients. Those patients with low risk of disease recurrence (AJCC 8th edition stage I and II patients) who would not derive additional benefit from chemotherapy could therefore potentially avoid the toxicity induced by chemotherapy.

In this work, we interrogate a prognostic and predictive radiomic image signature (pRiS) that employs quantitative texture features derived from within and around the primary oropharyngeal tumor on treatment planning CT scans to predict survival and chemotherapy benefit. Using a total of 491 patients from 4 sites, treated either with radiotherapy + chemotherapy or radiotherapy alone, our goal is to investigate if pRiS is (a) prognostic of survival and (b) associated with chemotherapy benefit in AJCC stage I and stage II HPV-associated OPSCC patients.

Results

Clinical characteristics

Table 1 lists the detailed demographic and clinical characteristics of the patients in cohorts D1 (n = 60), D2 (n = 162), and D3 (n = 269). Of the total 491 patients included in the study, 409 (83.3%) were men, and the median (interquartile range) age was 59 (53.8–64.2) years. 223 patients (45.4%) had AJCC 8th edition stage I OPSCC, of which 128 (57.4%) patients received chemotherapy. 268 patients (54.6%) had AJCC 8th edition stage II OPSCC, of which 201 (75%) patients received chemotherapy. More details regarding treatment are provided in Supplementary Table 2. Demographic and clinical characteristics according to treatment groups are provide in Supplementary Table 3. The median values for OS in D1, D2 and D3 were 7.48, 7.76, and 5.16 years and the median values for DFS were 7.06, 7.63, and 5.13 years.

Table 1 Demographic characteristics of the patients.

pRiS for prognosis prediction

Top selected features from LASSO model along with their LASSO coefficients are provided in Supplementary Table 4. The median pRiS value from D1 was −1.04 and was used for dividing patients into high- and low-pRiS groups. Clinical characteristics within the high- and low-pRiS groups are summarized in Supplementary Table 5. In D1, there were significant differences in OS (Fig. 1a) comparing patients with high- versus low-pRiS. The prognostic performance of pRiS was confirmed on D2 and D3 for OS (D2, HR = 2.14, 95% CI, 1.1–4.16, p = 0.025, C-index = 0.6, Fig. 1b; D3, HR = 2.74, 95% CI, 1.34–5.62, p = 0.006, C-index = 0.64, Fig. 1c). For DFS, the difference between high- and low-pRiS groups were significant on D1 (Fig. 1d) but not significant on D2 and D3, although we could observe a clear separation of the KM curves (D2, HR = 1.81, 95% CI, 0.97–3.37, p = 0.06, C-index = 0.62, Fig. 1e; D3, HR = 1.69, 95% CI, 0.93–3.05, p = 0.08, C-index = 0.61, Fig. 1f). On D2, the 5-year OS and DFS rates were 80.0% and 78.8% for the high-pRiS patients, and 90.2% and 84.1% for the low-pRiS patients. On D3, the 5-year OS and DFS rates were 88.9% and 85.0% for the high-pRiS patients, and 91.9% and 91.4% for the low-pRiS patients. Intratumoral and peritumoral feature expression heatmaps for one example high-pRiS and one low-pRiS patient are provided in Fig. 2. Two features that contribute to construction of pRiS (Intratumoral Laws texture and peritumoral CoLlAGe inertia) were spatially mapped on top of the tumor itself and annular ring area around the tumor, with blue and red representing low and high feature values, respectively.

Fig. 1: Kaplan–Meier survival analysis of pRiS on AJCC 8th stage I and II HPV-associated OPSCC.
figure 1

Stratifications are provided using overall survival from cohorts D1 (a), D2 (b), and D3 (c) and using disease-free survival from cohorts D1 (d), D2 (e), and D3 (f).

Fig. 2: Intratumoral and peritumoral radiomic feature maps.
figure 2

Feature maps are overlaid onto the corresponding region of interests of an example low-pRiS patient (a) and an example high-pRiS patient (b). We observe stronger feature expressions in terms of Laws texture (intratumoral) and CoLlAGe inertia (peritumoral) in the high-pRiS patient compared to the low-pRiS patient.

Multivariate Cox analysis on OS in D2 and D3 are provided in Table 2 and Supplementary Table 6, adjusting for clinical variables including age, gender, tumor subsite, smoking PY, T-stage, N-stage and treatment types. pRiS remained an independent prognostic factor for OS in D1 (HR = 25.5, 95% CI, 4.68–138.9, p = 0.0002), in D2 (HR = 2.24, 95% CI, 1.05–4.76, p = 0.04) and in D3 (HR = 7.59, 95% CI, 1.37–42.14, p = 0.02). Point-biserial correlation analysis between each individual prognostic radiomic feature including pRiS and the clinicopathologic factors are provided in Supplementary Table 7.

Table 2 Multivariable analysis on OS using the combined cohorts of D2 + D3.

Incremental prognostic value of pRiS

pRiS, age, smoking PY, T-stage and AJCC overall stage were found to be significantly associated with OS in univariate Cox analysis in D1 (Supplementary Table 8) and were incorporated into the Mrad+c (Fig. 3a). The clinical nomogram Mc is provided in Supplementary Fig. 1. The calibration curves for the radiomic nomogram Mrad+c at 4, 5 and 6 years showed good agreement between the actual and the estimated OS (Fig. 3b, c) and DFS (Supplementary Fig. 2) on D2 and D3. The decision curve analysis for OS prediction revealed that the Mrad+c had higher clinical net benefit than Mc when the threshold probabilities are less than 20% (Fig. 3d, e). Mrad+c resulted higher C-index compared with Mc regarding OS (0.68 vs 0.64, p = 0.06) and DFS (0.6 vs 0.57, p = 0.1) estimation (Table 3), although the differences were not significant. We then obtained the prognostic accuracy of pRiS, Mrad+c and Mc using time-dependent ROC analysis at specific times for predicting 4-, 5-, and 6-year OS and DFS. We observed consistently higher AUC from Mrad+c compared with Mc across D1, D2, and D3 for both survival endpoints (Supplementary Fig. 3).

Fig. 3: The radiomic nomogram and its calibration and decision curve analysis.
figure 3

The nomogram consists of pRiS, age, T-stage, smoking PY, and AJCC 8th staging (a). Calibration curves showed good agreement between the predicted and actual OS on D2 (b) and D3 (c). Decision curve analysis demonstrated higher net benefit of OS predictions on D2 (d) and D3 (e) from the radiomic nomogram Mrad+c compared to the clinical nomogram Mc. Net benefit = true positive rate − (false positive rate × weighting factor). Weighting factor = Threshold probability/1 − threshold probability.

Table 3 C-indices of nomograms.

pRiS for predicting chemotherapy benefit

For patients treated with only radiotherapy, high-pRiS patients had significantly worse OS compared to low-pRiS patients in both D2 (HR = 2.8 [1.1–7.17], p = 0.0316, Fig. 4a) and D3 (HR = 10.9 [2.66–44.9], p = 0.0009, Fig. 4b). By contrast, no significant differences in OS were observed between the high-pRiS and the low-pRiS patients treated with chemoradiation in either D2 (HR = 1.76 [0.76–4.06], p = 0.186, Fig. 4c) or D3 (HR = 1.9 [0.735–4.93], p = 0.184, Fig. 4d). These consistent patterns across D2 and D3 suggest that only the high-pRiS patients were able to derive benefit from chemotherapy, as reflected by their better OS when treated with chemotherapy in addition to radiation.

Fig. 4: Stratified Kaplan–Meier survival analysis according to treatment arms between high-pRiS and low-pRiS groups (using median cutoff).
figure 4

On both D2 and D3, the high-pRiS groups have significant worse OS than the low-pRiS group when treated with radiotherapy alone (a, b) while the separations in the chemoradiation arm (c, d) were not significant, indicating high-pRiS patients potentially could benefit from chemotherapy.

We then compared OS between patients in the two treatment arms and found that only high-pRiS patients tended to have longer survival when chemotherapy was administered while low-pRiS patients did not. For both D2 and D3, chemotherapy was associated with an improved OS in high-pRiS patients (radiation vs chemoradiation, D2, HR = 4.47 [1.73–11.6], p = 0.002, Fig. 5a; D3, HR = 2.99 [1.04–8.63], p = 0.0429, Fig. 5b). On the other hand, for patients in the low-pRiS group, chemotherapy did not affect OS (radiation vs chemoradiation, D2, HR = 2.56 [0.745–8.77], p = 0.136, Fig. 5c; D3, HR = 0.278 [0.052–1.49], p = 0.135, Fig. 5d). We also performed the experiments using DFS as the endpoint and obtained similar results (Supplementary Fig. 4 and Fig. 5). We also used the threshold output −1.1 (with the smallest p value in D1) from X-tile software as the cutoff to define the pRiS groups and repeated the predictive experiments for OS (Supplementary Figs. 6 and 10) and DFS (Supplementary Fig. 7) as outcomes. When combining D2 and D3 for interaction test, there was a significant interaction between treatment and pRiS groups in the Cox regression model (p = 0.04), indicating a predictive effect of pRiS. On subset analysis by AJCC 8th overall stage (stage I and II), pRiS was predictive of chemotherapy benefit for stage II (p = 0.047; Supplementary Fig. 8) but not for stage I patients (p = 0.38; Supplementary Fig. 9).

Fig. 5: Kaplan–Meier survival analysis for comparing OS between patients treated with radiotherapy alone and treated with chemoradiotherapy.
figure 5

Only high-pRiS patients did benefit from chemotherapy (a, b) while there was no advantage or potential negative impact on survival in low-pRiS patients (c, d) when treated with chemotherapy.

Discussion

The European Organization for Research and Treatment of Cancer (EORTC) 2293110 and Radiation Therapy Oncology Group (RTOG) 95-01 randomized clinical trial6 demonstrated the survival benefit of concurrent chemoradiotherapy in high-risk head and neck cancer patients. This paved the way for chemoradiotherapy to become the standard treatment protocol22,23 for locally advanced OPSCC patients, regardless of HPV status. However, clinical characteristics and treatment response differ between various head and neck cancer subsites24 (e.g., oropharyngeal, laryngeal and oral cavity). As the HPV-associated OPSCCs have more favorable prognosis and are more responsive to radiation than the HPV negative OPSCC2, it is likely that some of these patients do not receive added benefit from chemotherapy and might have had a similar outcome if treated with radiotherapy alone. With the advent of HPV related cancers causing a significant epidemiological shift, there is a growing call for more specific and selective treatment planning strategies to be reconsidered for HPV-associated OPSCC12,22.

Currently, there are many treatment de-escalation strategies for HPV-associated OPSCC patients, including eliminating chemotherapy or reducing the dosage of radiotherapy, with no consensus on an optimal choice. Recently, a randomized phase II trial (NRG-HN002)12 assigned HPV-associated OPSCC patients into one of two de-escalated treatment arms: (1) reduced dosage of radiotherapy with weekly cisplatin, or (2) accelerated radiotherapy (60 Gy) alone. Results showed that there were no significant differences (p = 0.23) in progression-free survival (PFS) rate between the two arms and the two-year OS rates were similar (96.7% and 97.3%). These results potentially indicate that not all selected candidates for treatment de-escalation benefited from the additional chemotherapy and they could have avoided the aggressive chemotherapy without compromising outcome. The interim analysis of NRG HN-005 trial failed to demonstrate the non-inferiority of 60 Gy of radiation plus cisplatin arm to the standard arm of 70 Gy with cisplatin. One possible explanation to this preliminary result is that the current selection criteria for lower risk HPV + OPSCC patients, which is based on clinical T/N stages and smoking status, remains insufficient for risk stratification. Development of novel and robust biomarkers to improve the accuracy of prognostication for HPV + OPSCC patients may aid in the design of future de-intensification trials. While definitive chemoradiation differs from the adjuvant treatment protocols, it still highlights the therapeutic importance of chemotherapy. Given these results, an individualized approach to accurately identify OPSCC patients who are most likely to benefit from chemotherapy would enable delivery of precision care to these patients.

In this work, we present a prognostic CT-based radiomic signature (pRiS) which is also predictive of added benefit of chemotherapy using a cohort of 491 AJCC 8th edition stage I and II HPV-associated OPSCC. pRiS comprises 7 radiomic textural features, one of which captures spots and waves textural heterogeneity patterns from within the primary tumor and six from the peritumoral region characterizing the tissue microenvironment around the tumor. Within the 0–15 mm annular ring around the tumor, 2 Gabor feature quantifying filter response and four CoLlAGe features characterizing co-occurrence of gradient orientations and intensity disorders were found to be prognostic. In multivariable analysis, pRiS was found to be an independent prognostic predictor and was able to stratify patients into high- and low-risk groups with significant differences in OS. Clinically, AJCC 8th edition stage I and stage II HPV-associated OPSCC patients could be considered for treatment de-escalation. However, a subset of these patients still had poor survival outcome and need to be identified for better treatment management24,25. The pRiS developed in this study was able to risk stratify these two populations, indicating that the true low-risk patients from these two clinically defined low-risk groups could be distinguished. On the other hand, high-pRiS patients were associated with worse outcome and thus should not be treated with de-escalation protocols. In addition, an integrated nomogram combining clinical factors and radiomic could improve the prognostic accuracy than using either alone26,27. These results indicate that pRiS could provide complementary information on OPSCC outcome beyond what is obtainable via currently known prognostic predictors.

pRiS was also found to be predictive of added benefit of chemotherapy for AJCC 8th stage I and II HPV-associated OPSCC patients, which is the main innovative contribution of this work. To the best of our knowledge, there is no existing studies aimed to identify HPV-associated OPSCC patients who might not benefit from chemotherapy in addition to the definitive radiotherapy. Although previous work focused on risk-stratification and prognosis prediction, these studies could not provide an individualized treatment strategy based solely on the predicted risk profile18,19,20,21. In contrast, the pRiS developed in this study not only carry prognostic value but more importantly it could also convey which patients would derive additional benefit if treated with chemotherapy versus those who would not. This would enable a more granular and robust treatment de-escalation for OPSCC patient with low risk of recurrence. Currently, the criteria for selecting candidates suitable for chemotherapy are based on AJCC 7th edition staging28,29,30,31, which did not take HPV status into account for risk stratification. In our study, we found that patients with high-pRiS scores using median value from training set as cutoff (>−1.04) were estimated to benefit from chemotherapy with reduced hazard of dying while low-pRiS patients (<−1.04) showed non-significant hazard ratios between the two treatment arms when using OS as the endpoint. A slight change in the high- and low-pRiS membership using X-tile cutoff selection (−1.1) did not alter the statistical significance found in the radiotherapy alone cohorts, indicating the robustness of pRiS regarding predictive power (Fig. S10A, B). These results suggest that most of the high-pRiS patients in radiotherapy treatment arm are “truly high risk” patient populations with worse outcome and could have benefited from additional chemotherapy after radiation. This is strongly suggested by results in the chemoradiation arm, where we do not observe significant survival difference between high- and low-pRiS populations (Fig. S10C, D), potentially suggesting that high-pRiS patients have improved survival outcome due to receiving benefit from chemotherapy. When using DFS as the endpoint, high-pRiS patients derived statistically significant DFS benefit in D2 when treated with chemotherapy while did not yield the same significant DFS benefit in D3 (p = 0.074). However, we could observe a clear trend that patients who received chemoradiation had more favorable prognosis compared with those treated with radiation alone. Interestingly, low-pRiS patients in D3 showed detrimental effect of chemotherapy (HR < 1, radiotherapy alone vs chemoradiotherapy), regardless of the endpoint (OS, Fig. 5d; DFS, Fig. S5D). These results suggest chemotherapy only improved high-pRiS patients’ outcome and there is no added benefit of instituting potentially toxic chemotherapy for low-pRiS patients with favorable prognosis.

Machine learning-based approaches have been applied to radiographic images for prognosticating outcome for head and neck cancer patients. Ou et al.32 retrospectively analyzed 120 patients with locally advanced HNSCC and constructed a signature integrating radiomics with p16 status. They found that patients with high signature score significantly benefited from chemoradiation while no benefit from low signature patients was observed. While the results were promising, this study did not investigate the predictive value of peritumoral radiomic. In another study conducted by Howard et al.33, 33,527 HNSCC patients were included for analysis and chemotherapy recommendations from three deep learning (DL) models were all associated with survival benefits. It is known that DL-based approaches provide limited interpretability. On the other hand, the selected hand-crafted features in this study for pRiS includes descriptors characterizing CT pixel textural patterns (e.g., local intensity heterogeneity and microscale disorder in gradient orientation). Kann et al.34 validated a DL algorithm in identifying Extranodal extension (ENE) on pretreatment CT scans in 144 patients with HNSCC. They demonstrated that the algorithm’s diagnostic performance surpassed that of specialized radiologists with head and neck cancer experience. Although pathologic ENE is an indication for adjuvant treatment escalation, to the best of our knowledge, there has been no studies on developing predictive imaging biomarker which could guide OPSCC clinical treatment decision in the definitive setting. A previous study by Leijenaar et al. externally validated the prognostic value of intratumoral radiomic signatures in a cohort of 542 OPSCC35 (C-index = 0.63). Our study is different from Leijenaar’s study in that we focused only on OPSCC patients with low-risk profile (HPV-associated stage I and II patients). Since HPV+ and HPV- patients are clinically considered different tumor entities with distinct outcome and treatment response, it is more meaningful to risk-stratify these two patient populations separately. Rather than combining these two populations together for analysis like Leijenaar et al study, we focused on prognosis and treatment response prediction specifically for HPV+ patients.

Within the HPV+ population, currently the main treatment strategy is to provide de-escalated treatment protocols (i.e., less dosage of radiotherapy and potentially avoid the aggressive chemotherapy). Existing studies (including Leijenaar et al.35) either only constructing prognostic biomarker without investigating their association with treatment benefit18,19,20,21 or managed to quantify the treatment benefit for head and neck cancer with various disease subsites32,33 (oropharyngeal, laryngeal and oral cavity). Our study not only developed a prognostic radiomic biomarker which enabled individualized risk prediction of HPV-associated OPSCC patients, but also tackled a clinical-significant problem of identifying potential candidates not benefiting from chemotherapy and thus for whom such toxic treatments could be avoided.

This work is significantly different from previous related publications16,18,20,21 by (a) pRiS was shown to be not only prognostic for OPSCC patients’ outcome but also predictive to added benefits of chemotherapy; (b) rather than analyzing patients with different HNSCC subtypes, we specifically investigated the role of chemotherapy in the context of AJCC 8th stage I and II HPV-associated OPSCC patients.

We acknowledge that our study did have its limitations. First, this is a retrospective study subject to confounders and biases. Second, radiation doses and type of chemotherapy are not strictly controlled for the two treatment arms. Third, we did not manage to assess the intra-observer and inter-observer agreement of the radiomic features, which might cause selected features susceptible to variation of the tumor annotations. Fourth, the training set D1 has a relatively small sample size with 20% of events, potentially limiting its generalizability to a broader population with a more diverse demographic distribution. This is suggested by the C-index discrepancy between the training and the validation datasets. More data is needed to construct a predictive biomarker with better generalizability. Nevertheless, the findings in this study could guide future studies in the design of more robust predictive biomarkers for HPV-associated OPSCC treatment de-escalation. Future investigations, such as a prospective clinical trial with a large patient cohort aimed at comparing survival benefit between high-pRiS OPSCC patients treated with and without chemotherapy, would be necessary to demonstrate its clinical utility.

In summary, we have developed and validated a seven-feature prognostic and predictive signature for chemotherapy benefit in patients with HPV-associated OPSCC. Further work on larger populations will be needed to validate this preliminary biomarker. If prospectively validated, pRiS could serve as an inexpensive, tissue non-destructive prognostic and predictive companion diagnostic tool to identify patients most likely to be safely de-intensified.

Methods

Patients and outcome

This retrospective study included 491 AJCC 8th edition stage I and II HPV-associated OPSCC patients from four different cohorts. Among the 222 patients from the Cancer Imaging Archive (TCIA)36,37 “OPC-Radiomic” cohort, 114 patients received chemoradiation and 108 patients received radiotherapy alone. 60 patients with radiotherapy alone from the “OPC-Radiomic” cohort formed the training set (D1) while the remaining 162 patients formed the internal validation cohort (D2). Among the 222 patients from the Cancer Imaging Archive (TCIA) “OPC-Radiomic” cohort, 114 patients received chemoradiation and 108 patients received radiotherapy alone. The fundamental rule is to unsure all patients in the training set received only radiation therapy (no chemotherapy), while maintaining descent number of patients treated with radiation alone in the internal validation set so as not to lose statistical power when comparing survival difference between the two treatment arms. We thought taking 60 out of the 108 patients treated with radiation alone to form the training set (D1) should be reasonable. Then all the remaining 48 radiation-treated patients plus 114 patients treated with chemoradiation (total 162) will be assigned to the internal validation set. We performed the following steps to determine the exact constitution of D1: we randomly selected 60 patients from the 108 patients treated with radiation alone and performed the statistical tests to see if there were statistically significant difference (p < 0.05) on clinicopathological variables (i.e., age, gender, AJCC 8th staging and disease recurrence) between the selected 60 patients and the remaining 162 patients (222-60). Differences between age were estimated using Wilcoxon rank-sum test and differences on gender, AJCC 8th staging, and disease recurrence were calculated using the chi-square test. We repeated this experiment for 500 iterations and documented the count for each patient which appears in iterations where there was no statistical difference between the selected 60 and the remaining 162 patients across all the four variables. Then the counts for all patients were re-sorted in descending order and we chose the top 60 patients to form the D1. The overall rationale is to keep balanced clinicopathological profile across the D1 and the D2. Three additional cohorts including cases from Cleveland Clinic Foundation (CCF, N = 91), TCIA-HNSCC (N = 134), and TCIA-Head-Neck-PET-CT (N = 44) were combined to form the external validation set (D3), which consisted of 269 patients in total. A patient selection diagram is provided in Fig. 6. HPV status for D1 and D2 cohorts were determined using p16 immunohistochemistry (IHC). A combination of p16 IHC and/or HPV DNA in situ hybridization were used for the D3 cohort. The waiver of written informed consent was granted by the IRB at Cleveland Clinic since all images are de-identified. In addition, it is not practicable to obtain consent from patients for this study since most patients was not alive or available for consent.

Fig. 6
figure 6

Patient selection diagram.

Inclusion criteria were: (i) non-metastatic (M0) AJCC 8th stage I and II HPV-associated (strong and diffuse, block-like nuclear and cytoplasmic p16 staining present in ≥70% of the tumor specimen) OPSCC, (ii) with available baseline pretreatment CT images covering the head and neck region, (iii) treatment with curative intent, (iv) follow-up continued for at least 20 months or until death and (v) matched clinical information (e.g., age, stage and survival). Patients with any of the following were excluded: (i) lack of identifiable tumors on CT scans or (ii) number of pixels within tumor being less than 200, this was the minimum tumor volume that was deemed to be necessary for feature extraction. Disease-free survival (DFS) was defined as time from radiotherapy end date to the date of following events: local, regional, distant failure or death whichever occurred earlier and censored at the date of last follow-up for those without event. Overall survival (OS) was defined as the time from radiotherapy end date to death from any cause and censored at the date of last follow-up for those alive. Clinical and outcome information from patients in the CCF cohort was obtained by chart review after approval from the Institutional Review Board (IRB number: CCF 14–551). The overall workflow is provided in Fig. 7.

Fig. 7: Flowchart of the radiomic workflow.
figure 7

a Primary tumor identification; b radiomic feature extraction; c radiomic signature profiling, and d signature validation.

CT image acquisition, segmentation, and compartment definition

All patients underwent an initial pretreatment CT scan with or without contrast agent injection for radiation therapy planning. CT images for the CCF cohort were acquired from GE (Chicago, IL) or Siemens (Erlangen, Germany) scanners. CT scans were acquired in helical mode with a slice thickness of 3 mm, at 120 kVp and 235 mAs tube current. Image resolution was between 0.4 mm - 0.5 mm for most of the patients, with image matrix of 512 × 512. The CT images for the three TCIA cohorts were acquired from one of the following CT scanners: General Electric Discovery ST; General Electric Lightspeed Plus; Toshiba Medical Systems Aquillion ONE. More details regarding the CT imaging parameters for the three TCIA cohorts included in this study could be found in Supplementary Table 1.

For primary tumor annotations on CCF cohort, two board-certified head and neck radiologists J.L. (with 5 years of clinical expertise) and S.S. (with 6 years of clinical expertise) manually delineated the tumor boundaries across all two-dimensional CT axial slices with visible tumor present using the 3D Slicer software38. The slices with dental artifacts were excluded for tumor annotations. The binary masks which comprise the outline of the gross primary tumor volume (GTV) for the three TCIA cohorts were obtained via the Radiation Therapy Structures (RTSTRUCT). RTSTRUCT is used to transfer anatomical structures related data between radiotherapy departments. It mainly comprises the information related to the regions of interest (ROIs) including GTV.

After obtaining the binary tumor mask for all patients, morphologic dilations were performed to define the annular ring region outside the tumor up to a radial distance of 15 mm based on a previous study16. The binary tumor masks were then subtracted from the dilated masks to obtain the peritumoral regions, which were then sub-divided into three peritumoral rings of 5-mm-radius increments. For all patients, peritumoral masks were dilated 15 mm from the corresponding intratumoral masks in a two-dimensional fashion. For each CT axial slice with tumor, the number of pixels dilated in each peritumoral mask were calculated as follows:

$${\rm{No}}.\;{\rm{of}}\;{{\rm{pixels}}}\;{{\rm{dilated}}}=\frac{15}{{{\rm{pixel}}}\;{{\rm{size}}}\;({{\rm{in}}}\;{{\rm{mm}}})}$$

During this peritumoral feature extraction process, additional consideration was taken to get rid of the region with air (<−900 HU). To avoid any edge artifacts that might arise during feature extraction, the filtered “dead” pixels of the CT scan were substituted by using an averaging filter across its 9 × 9 neighborhood.

Radiomic feature extraction

Radiomic feature extraction was performed for each patient on the primary tumor using an in-house program developed with MATLAB 2020b (Mathworks, Natick, MA, USA), meeting IBSI criteria in terms of the reporting criteria for reproducible and transparent implementations. We resampled all CT images to an isotropic voxel size of 1 mm prior to feature extraction. The feature families utilized in this study included gray level intensity features, gray level co-occurrence matrix (GLCM) Haralick features39, Laws energy, Gabor wavelet-based features, and intensity gradient orientations features (CoLlAGe)40. These features were extracted on a per-pixel basis across all 2D slices with tumor for all patients from both intratumoral and peritumoral regions (0–5 mm, 5–10 mm, and 10–15 mm). The feature values were averaged across all slices. Statistics of mean, median, standard deviation (std), skewness, and kurtosis were calculated from the feature responses of all pixels within the region of interest, which resulted in a total of 2045 features. All feature values were transformed into new scores with a mean of 0 and a standard deviation of 1 (z-score transformation). Detailed description of extracted features is provided below:

Laws texture

2-dimensional Laws filters are derived by computing the outer product of combinations of the following 1-dimensional filter vectors focused on different texture patterns: Level (L5)—detects smoothness of intensity values, L5 = [1 4 6 4 1]; Edge (E5)—detects edges between regions with abrupt changes in intensity, E5 = [−1 −2 0 2 1]; Spot (S5)—detects speckled enhancement patterns, S5 = [−1 0 2 0 −1]; Wave (W5)—detects oscillating local intensity patterns, W5 = [−1 2 0 −2 1]; Ripple (R5)—detects oscillating intensity patterns centered at region of extreme intensity, R5 = [1 −4 6 −4 1].

To obtain a feature vector, convolution is performed on the filters and the images within a window size neighborhood followed by summing up all the values. Features are named by the combination of filters applied in the y and x axes, e.g., L5E5 is the product of a level detection filter in the y axis and an edge detection filter in the x axis.

Gabor filter responses

Two-dimensional Gabor filters are computed by modulating a Gaussian kernel function with one of 48 sinusoidal plane waves. Each sinusoidal plane wave corresponds to a unique combination of one of four spatial wavelengths (2 pixels, 4 pixels, 8 pixels, 12 pixels) and one of seven orientations (22.5°, 45°, 67.5°, 90°, 112.5°, 135°, 157.5°). Each Gabor filter is then convolved with the original image and values corresponding to filter response within the region of interest are concatenated.

Haralick features

The following Haralick gray level co-occurrence matrix (GLCM) descriptors were computed: entropy, energy, inertia, inverse difference moment, correlation, information measure of correlation 1, information measure of correlation 2, sum average, sum variance, sum entropy, difference average, difference variance, and difference entropy. GLCM statistics were concatenated within a window size neighborhood, yielding 13 descriptor vectors per region.

Co-occurrence of local anisotropy gradients (CoLlAGe)

An image’s intensity gradients in the x and y direction are computed. Within a window size neighborhood, the dominant intensity gradient orientation (between 0° and 360°) is computed via principal component analysis, resulting in a 2D array of equal size with the dominant gradient orientation value centered at the corresponding pixel of the original image. Metrics of the co-occurrence matrix are then applied to this gradient orientation image in the same manner as described above for Haralick GLCM features. The resulting 13 CoLlAGe descriptors quantify the homogeneity of intensity gradient directionality within an image.

pRiS: a radiomic risk signature

The top prognostic radiomic features were selected by applying a least absolute shrinkage and selection operator (LASSO) Cox Proportional Hazard model41 on OS using D1. The optimal value of the tuning parameter in the LASSO Cox (alpha) model was determined by 10-fold cross validation to search for 100 values up to 1% of the estimated maximum. Once the parameter alpha is determined, the number of features was locked down to 7. pRiS was then constructed by linearly combining these 7 features weighted by their corresponding coefficients. The cutoff value for dividing patients into low- and high-risk patients was chosen using the median pRiS value from D1 based on OS. The potential association between the dichotomous pRiS group (high- vs low-pRiS group) and survival was first evaluated in D1 and then validated in D2 and D3 based on Kaplan-Meier (KM) survival analysis and Harrell’s concordance index (C-index).

Combining pRiS with clinical factors to improve prognostication

To investigate whether pRiS adds incremental prognostic value to the clinical factors for individualized prediction of OS and DFS, we constructed an integrated radiomic nomogram (Mrad+c) by combining pRiS with the prognostic clinical factors and compare it against a clinical nomogram (Mc). Variables significant in univariate analysis in D1 were included in the Mrad+c. Both the Mrad+c and Mc were developed in D1 based on the multivariable Cox regression analysis and tested in D2 and D3 with respect to calibration, discrimination (C-index), and clinical usefulness. Calibration curves along with the Hosmer–Lemeshow test were used to compare the predicted survival probabilities with the actual probabilities. Decision curve analysis was performed to quantify the net benefit at various threshold probabilities and to compare the clinical usefulness of the Mrad+c and Mc.

Association of pRiS with chemotherapy benefit

Patients from D2 and D3 sets who received either radiotherapy alone or chemoradiation were included for analysis. We evaluated pRiS in terms of its ability to predict the benefit of chemotherapy by comparing OS and DFS differences between patients who were treated with radiotherapy alone and those with chemoradiation in both the high-pRiS and low-pRiS groups. We also examined the predictive value of pRiS separately for AJCC 8th edition stage I and stage II patients. Furthermore, we used the cutoff values generated from the X-tile software42 version 3.6.1 to investigate the predictive value of pRiS on OS and DFS. X-tile provides a global assessment of every possible way of dividing a population into two groups based on the given biomarker, of which each possible biomarker value from the training set represents a unique cut-off point. The criteria to define the optimal threshold is to choose the cut-off point with the smallest p value from the log-rank test by iterating through all possible groupings defined by all unique cut-off points from training set. Using median value as the cut-off aims to maintain a balance between the number of patients in high- and low-pRiS groups in the training set while X-tile seeks to maximize the survival stratification between the two groups from a statistical perspective. The rationale for trying both cutoff-selection approaches is to investigate the robustness of predictive power carried by pRiS.

Statistical analysis

The difference of the continuous variables (i.e., age, smoking pack-year [PY] and pRiS) among 3 datasets (D1–D3) was compared using the one-way analysis of variance test (ANOVA) and the association of the categorical variables with 3 datasets was estimated using the chi-square test. Differences of OS and DFS among groups was examined using the log-rank test. Univariate analysis of the continuous pRiS value with the clinical and pathologic factors (i.e., age, gender, tumor subsites, smoking in pack years, pathological T- and N-stages, and AJCC 8th overall stage) were conducted. Multivariable Cox regression analysis was performed to assess the relationships between the various clinical factors and OS. We also performed the point-biserial correlation analysis to investigate potential association between each individual prognostic radiomic feature and the clinicopathologic factors. Interaction between pRiS and chemotherapy was assessed by adding an interaction term in the Cox model. Statistical analyses were performed using R version 3.4.0. The R packages used in this study included glmnet, survminer, rms, survival, Hmisc, survMisc, survey, and SvyNom. A threshold of 0.05 was used to define statistical significance.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.