Introduction

Accurate lymph node metastasis (LNM) assessment is essential for diagnosing and staging head and neck cancer [1, 2]. Traditional imaging diagnostic tools, including computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET), have their limitations. These diagnostic imaging tests primarily evaluate lymph nodes based on size and shape. The sensitivity of other clinical examinations, such as physical examination, or initial diagnostic strategies, such as fine-needle aspiration cytology (FNAC), for detecting LNM ranges from 60.7 to 71.4% [3,4,5]. The most accurate diagnosis of lymph node metastasis relies on pathological analysis. However, in clinical practice, histological screening of lymph node specimens for the presence of metastatic disease requires careful and precise execution, which is time-consuming, labor-intensive, and error-prone. As technology advances, experts have incorporated artificial intelligence (AI) into digital pathology, mainly using deep learning to solve these problems [6]. Such AI-driven techniques employ advanced algorithms to analyze pathological images, identifying intricate patterns to improve diagnosis and assist pathologists in clinical workflow [7, 8]. In addition, recent studies have applied artificial intelligence to assist LNM detection in medical imaging, and some AI models have achieved promising results [9,10,11,12,13,14,15].

Radiomics, a method rooted in machine learning, has been explored for its capability to distinguish lymph node metastasis (LNM) in head and neck cancer [9, 10]. However, one notable drawback of this method is the high inter-correlation among the manually crafted image features, leading to feature biases based on pre-existing assumptions [11]. Convolutional neural networks (CNNs) have the advantage of learning image features automatically, avoiding these feature biases and creating new opportunities for analysis [12]. CNNs have shown superior performance over radiomics when large datasets are available. Despite the nascent stage of employing deep learning for LNM identification, several researchers have made significant headway [13,14,15]. For instance, Ariji et al. utilized an 8-layer CNN to scrutinize 441 lymph nodes across 45 patients, achieving a sensitivity, specificity, and area under the receiver operating characteristic curve (AUC) of 0.75, 0.81, and 0.80, respectively [14]. Similarly, Tomita et al. designed a deep neural network for the pre-surgical diagnosis of metastasized cervical lymph nodes in oral squamous cell carcinoma (OSCC) patients using CT imaging, boasting an impressive AUC of 0.957 [15].

While CNNs have shown promise, their depth, and thus their efficacy, has been limited by the gradient vanishing problem [16]. In 2016, the Resnet network was proposed to address this problem by introducing a residual link structure and deepening the neural network from 19 layers to hundreds of layers [17]. Resnet was subsequently adapted from a 2D format to a more intricate spatial–temporal 3D variant, proving its utility on video datasets [18, 19]. In recent advancements, the 3D-Resnet has been harnessed for spatial-3D datasets, especially in medical imaging domains like MRI and CT [20,21,22]. In this study, we aimed to develop a deep-learning Resnet neural network to improve the precision of LNM diagnosis in patients with OSCC. Developing an accurate and efficient tool for LNM detection can potentially improve patient outcomes and reduce unnecessary treatments.

Methods

Our investigation included patients diagnosed with head and neck cancer (HNC) who underwent neck lymph node dissection (LND) at our institute from January 2019 to March 2021. To minimize the impact of preoperative CT scans on diagnostic accuracy for cervical nodal metastases, we selected patients who had received a diagnostic intravenous contrast-enhanced CT scan of the neck within 30 days before LND. We gathered demographic, clinical, and pathological data, which featured the anatomical site, tissue dimensions, microscopic characteristics, and lymph node assessment for malignant cells. We collected paired preoperative contrast-enhanced CT scans from 156 individual HNC patients with corresponding pathological reports. These scans were conducted using three distinct CT scanners from two leading manufacturers: Revolution CT (GE Healthcare), Brilliance 64 (Philips Healthcare), and iCT 256 (Philips Healthcare). In adherence to the Helsinki Declaration’s ethical guidelines, our study received approval from the Institutional Review Board.

Establishing Ground Truth

The cervical lymph node annotation was manually contoured slice-by-slice in the axial plane for each CT scan, and the segmentations were labeled as either “lymph node metastasis negative (LNM-)” or “lymph node metastasis positive (LNM +)” based on the corresponding LND pathology report. To ensure accuracy, the dissected lymph nodes were tagged according to laterality, neck level, surrounding tissue type, and nodal size, as determined from a correlative evaluation of the pathology report (Supplementary 1). The segmentations and labels were saved as Radiotherapy Structure Set (RTSS) files using Varian v15.1 radiation planning software. Two radiation oncologists reviewed the segmentation accuracy to ensure consistency in establishing the ground truth for lymph node metastasis. These nodes were carefully examined and confirmed pathologically by radiation oncologists. Following this, they were labeled and captured in computed tomography images. The harvested segmentations and labels formed the foundational dataset for training and testing the 3D-Resnet neural network.

Image Processing and Deep Learning Model

In the preprocessing image phase, a 3D binary mask for each lymph node was produced. The lymph node HU values were cropped according to the boundary of the contours from RTSS contours, resulting in a 3D image of the lymph node called “size-preserved” lymph node imaging, which was placed in the center of a bounding box backgrounded by zeroed voxels. Another type of region of interest (ROI) was generated from the “size-preserved” image, called “size-invariant” ROIs, which were created to compare the effect of different input information. The study used a ratio of 3:1 to randomly split the training and test datasets from the lymph nodes, with LNM-negative data duplicated to balance the two labels to a 1:1 ratio. Augmentation skills of rotation and flip were applied to increase the training data numbers to 16-fold.

The study proposed a dual-pathway 3D-CNN network with residual connections (3D-Resnet), with the two pathways designed to receive input from the size-preserved and size-invariant images. Both pathways were constructed with 34 convolutional layers and joined after feature extraction to predict a dual-label output of either LNM-positive or LNM-negative. The architecture of the 3D-Resnet model is displayed in Fig. 1. To prevent overfitting, batch normalization layers, dropout layers, and L2 regularization were utilized, with the output probabilities obtained by applying sigmoid classifiers. The study performed ablation experiments to optimize the architecture of the proposed model, testing single pathway models in different depths and exploring the best depth of dual-3D-Resnet by trying different depths of convolutional layers and recording their performance. The Adam optimizer tuned the network weights during the training phase with a mini-batch size of 16 and an initial learning rate of 0.001. The selected models underwent four-fold cross-validation to ensure robustness. The study used Python v3.9.12, Pytorch v1.9, SciPy 1.8.0, sci-kit-learn v1.0.2, sci-kit-image v0.19.2, and SimpleITK v2.2 packages to create and implement the network. The algorithm’s training ground was an RTX 2080 Ti graphics processor unit, and the proposed code is publicly available on GitHub (https://github.com/acqxi/hnc).

Fig. 1
figure 1

Dual-pathway 3D-ResNet architecture featuring both size-preserved and size-invariant pathways

Evaluation Metrics Overview

The evaluation metrics employed in this study encompassed accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUC-ROC) for both training and test sets. An algorithm was employed to measure lymph node dimensions and volume to minimize interobserver bias. Lymph nodes were manually contoured slice-by-slice in the axial plane, and each segmentation was classified as LNM- or LNM + based on the corresponding LND pathology report. Lymph node volumes were calculated using the ellipsoid formula, considering the X, Y, and Z dimensions in pixels and slice thickness [23]. Accuracy refers to the proportion of accurately classified samples relative to the total number, while sensitivity measures the correct identification rate of LNM + cases by the model. AUC-ROC serves as a comprehensive metric for evaluating the model’s capacity to differentiate between LNM + and LNM- cases, with a value of 1 signifying perfect discrimination and 0.5 indicating no discrimination. Data were entered and analyzed in Microsoft Excel for percentage calculation. Statistical analysis considered sensitivity, specificity, PPV, NPV, and Kappa coefficient were calculated using SPSS software version 23.

Results

Table 1 lists the clinical and pathological data on the primary tumor site and clinical and pathological T and N stages of the 156 patients in this study. Within this cohort, each patient exhibited a range of 1 to 8 lymph nodes, culminating in 341 lymph nodes. The analysis of labeled-extracted lymph nodes revealed that LNM + lymph nodes were significantly larger across the X, Y, and Z dimensions (p < 0.001), as well as in volume (p < 0.001) when compared to LNM- lymph nodes (Table 2). These findings align with previous studies concerning lymph node metastasis in head and neck cancer, corroborating the clinical consensus. Subsequent ROC curve analysis was conducted to ascertain the optimal cutoff point for predicting LNM + , yielding 12.8 mm, 11.2 mm, 14.0 mm, and 15.5 ml for X, Y, and Z dimensions and volumes, respectively. Interestingly, the sensitivity and specificity values demonstrated variability based on size or volume, wherein the Y and Z dimensions exhibited the highest sensitivity (84.62%) and specificity (72.15%), respectively.

Table 1 Patients’ demographic data and TNM stage
Table 2 The dimensions of lymph nodes and its prediction values

Table 3 delineates the variants of 3D-Resnet models explored in this study, including the Single-3D-Resnet and Dual-3D-Resnet, along with input size and model depth alterations. It documents the AUC value, the number of model parameters, memory usage, and GFLOPs for each model variant. The findings reveal that the Dual-3D-Resnet model, at a depth of 34 and input sizes of 32 × 32 × 32 or 130 × 130 × 18, exhibits the highest AUC values of 0.929 and 0.914, respectively, outperforming the Single-3D-Resnet model (Supplementary 2).

Table 3 3D Resnet model performance

We expanded our examination of the efficacy of the 3D Resnet model in predicting LNM + by analyzing 86 lymph nodes in 38 patients with head and neck cancer (Fig. 2). We further compared these results with the clinicians’ physical examinations and the radiologists’ assessments based on CT scans (Table 4). The 3D Resnet model exhibits a high specificity of 90.0% and a sensitivity of 80.8%, showing a solid ability to identify positive cases. Moreover, the 3D Resnet model achieves the highest PPV of 77.8% and NPV of 91.5% among all methods in the lymph node analysis, indicating its proficient accuracy in predicting both positive and negative cases. The Kappa statistic of 0.700 further underscores the substantial agreement between the predicted and actual values by the 3D Resnet model, which is notably higher than other methods in patient and lymph node analysis contexts.

Fig. 2
figure 2

Illustration of cases from 3D ResNet predictions: (a–c) showcase LNM + cases with incorrect predictions alongside their corresponding CT scan features; (d–f) display LNM- cases with inaccurate predictions and their respective CT scan features

Table 4 Comparison of the perdition values for LNMs between different methods

Discussion

The findings from our study underscore the limitations inherent in the current preoperative diagnostic tools, including physical examination and radiological diagnosis, for accurate staging and devising treatment strategies for patients with head and neck cancer. This notion is further corroborated by the modest concordance between clinical and pathological N-stage classification observed among the 156 patients evaluated in this study, reflecting a need for more reliable preoperative diagnostic methodologies as echoed in prior literature [24]. Lymph node size has been identified as an essential factor in predicting lymph node metastasis, and our study confirms this finding [25, 26]. However, it is imperative to integrate lymph node size assessment with other clinical and radiographic indicators such as location, morphology, and presence of necrosis. As demonstrated in our study, experienced radiologists can attain high diagnostic performance by synthesizing these imaging findings, achieving a sensitivity of 91.7% and a specificity of 65.4%. Nonetheless, novice practitioners might not reach similar levels of diagnostic proficiency. Our study also suggests that advanced deep learning techniques like 3D Resnet can further strengthen the predictive values of these tools.

In addressing the challenges of accurate diagnosis, it is noteworthy to consider the performance of existing imaging modalities. The diagnostic accuracy of computed tomography (CT) scans in identifying positive cervical lymph nodes can be influenced by various factors [27, 28]. Within this scope, multiple studies have shown that PET-CT exhibits higher sensitivity and specificity compared to MRI and CT [29, 30]. Depending on the maximum standardized uptake value (SUV-max) threshold and cancer type, the sensitivity of PET-CT in detecting LNM + ranges between 75 and 84%, the specificity between 59.4 and 87%, the PPV between 19.1 and 75%, and the NPV between 94 and 95.7% [31, 32]. However, a limitation associated with employing PET-CT for detecting cervical lymph node metastasis in head and neck cancer is the potential for false-positive or false-negative results, which could lead to unnecessary surgical interventions and increased radiation exposure. Our study provides evidence that the 3D Resnet model has the potential to accurately identify LNM + and provide valuable information in determining surgical candidates, identifying surgical areas, and performing pixel-level risk analysis to minimize radiation exposure while preserving patient quality of life.

The success of radiotherapy is highly dependent on the accuracy of target volume delineation, and CT simulation is an integral component of the planning process [33, 34]. However, determining the appropriate irradiation of the neck lymph node area can be challenging in head and neck cancer due to the complex anatomy and variability of lymph node involvement [35]. Incorrect delineation can result in under- or over-treatment, compromising treatment efficacy and increased toxicity [36]. The decision to irradiate the neck lymph node area is influenced by factors such as the primary tumor’s location and extent, the risk of lymph node involvement based on clinical and pathological data, and the individual patient’s risk factors and treatment goals [2, 37, 38]. The findings of this study suggest that the 3D Resnet model could assist radiation oncologists in accurately delineating the lymph node region and identifying patients at high risk of LNM + . With the advancements in CT simulation technology, advanced radiation therapy can achieve a more conformal dose distribution, improving tumor control rates and reducing treatment-related toxicity.

Despite our study’s promising results, several limitations must be considered. Firstly, the study was conducted using data from only one medical center, which may limit the generalizability of the findings. Further external validation using data from multiple centers or a federated learning framework is necessary to ensure the model’s robustness and generalizability. Secondly, a lymph node detection and auto-segmentation model is required before implementing the recognition model into the clinical workflow. Lastly, we did not integrate the primary tumor area imaging data and clinical features, such as tumor biomarkers with lymph node ROI, to build a more robust model for LNM prediction. A further study could provide more robust evidence on the performance of the 3D Resnet model and its potential impact on patient outcomes.