Introduction

Gallbladder cancer is one of the most common malignant tumors of the biliary system, and its incidence ranks sixth among gastrointestinal malignancies [1]. Radical surgical resection currently remains the most effective treatment option. However, owing to its high degree of malignancy, lymph node and distant organ metastases are highly likely; hence, the treatment effect is not ideal. The 5-year survival rate of patients is extremely low at only 5–10% [2].

Establishing an accurate tumor survival prediction model can help improve the quality of life and survival time of patients with tumors. The current survival prediction model for gallbladder cancer is mainly based on tumor TNM staging and differentiation, liver infiltration, combined occurrence with jaundice, and the scope of lymph node dissection [3, 4]. Conventional gallbladder cancer survival prediction models use only a single type of data, such as pathological anatomy or biomarkers, for prognosis estimates, and their prediction accuracy is insufficient [5,6,7]. Owing to the heterogeneity and complexity of tumors, the prognostic effects of the same treatment plan will differ among patients, even if the postoperative pathological staging is the same. Therefore, it is necessary to supplement new indicators other than TNM staging, to screen high-risk patients prone to recurrence and metastasis, and transition from testing a single biomarker to comprehensively evaluating multiple pathophysiological features. With the advancement of medical technology and the development of information technology, tumor survival prediction models using different types of data have been developed and successfully applied in breast cancer, non-small cell lung cancer, and other diseases [8, 9]. Different types of clinical data contain information on different aspects of a patient’s condition. The use of imaging data such as enhanced computerized tomography (CT), structured data, which is derived from laboratory tests (such as routine blood and urine tests, as well as liver function and tumor markers), and data regarding systemic treatments to establish a multimodal gallbladder cancer survival prediction model will improve its prediction capability.

The traditional radiomics method of extracting features from medical imaging data relies on experienced imaging physicians to extract information based on specific tasks, which is time consuming, laborious, and subjective, and has poor generalizability and robustness. Due to the complex relationship between laboratory test data and patient prognosis and survival, specific analysis is required for different types of tumors; therefore, research on tumor prognosis is lacking. In recent years, with the development of deep learning technology, the use of multimodal medical data for personalized tumor survival prediction analysis has gradually become a trend; For example, Sun et al. used a deep neural network (DNN) to construct a breast cancer survival prediction model [10], Lao et al. used a convolutional neural network (CNN) to build a survival prediction model for glioblastoma [11], Chaudhary et al. used an autoencoder (AE) to build a liver cancer survival prediction model [12], and Xie et al. used a deep belief network (DBN) to construct survival prediction models for 14 types of cancer [13]. DNNs can merge low-level features and extract high-level features to predict the survival of cancer patients. Machine learning or statistical methods can be used to analyze and process structured numerical medical data, whereas deep learning technology can be used to automatically extract and quantify deep features in imaging data, based on learning from a large number of samples. The extracted abstract features can be objective, accurate, and robust, and have improved generalizability.

In this study, a multimodal fusion survival prediction model for gallbladder cancer patients was established in cooperation with two large-scale, tertiary hospitals in Shanghai. Data from different modalities, such as laboratory tests, medical imaging, and systemic treatments, based on deep learning and traditional statistical methods, were used. Through this model, hidden information in the medical data of different modalities can be fully utilized, and individualized survival predictions made for gallbladder cancer patients.

Methods

Overview

In this study, a multimodal data model based on deep learning was used to establish a survival prediction model for patients with gallbladder cancer. Data mainly included enhanced CT imaging, clinical laboratory tests, and systemic treatment data. The research concept is presented in Fig. 1. Patients’ imaging and clinical data—screened using Cox single-factor regression analysis, from which significant features (P < 0.05) were retained—were processed separately. Thereafter, the feature representation from the middle layer of the neural network and features from the clinical data were merged into one input and connected to the final model strategy, to establish a survival prediction model for patients with gallbladder cancer.

Fig. 1
figure 1

Process to establish the multimodal gallbladder cancer survival prediction model

Datasets and Experimental Platform

Imaging Datasets of Gallbladder Cancer Patients

The gallbladder cancer imaging datasets used in this study were obtained from two large tertiary hospitals affiliated to Shanghai Jiaotong University School of Medicine (195 cases). Among the patient samples, 92 patients were alive while 103 had died. The median survival time in this dataset was 450 days, and the male-to-female ratio was 76:119 (Table 1). The CT imaging data selected for each sample comprised the arterial phase image in patients’ enhanced CT scans with slice thickness of 1.0–1.5 mm. All imaging data were collected within 2–3 days after admission.

Table 1 Data features of the imaging datasets

Clinical Datasets of Patients with Gallbladder Cancer

Data from routine blood and urine, liver function, and tumor marker tests, performed on 195 gallbladder cancer patients upon admission, as well as systemic treatments data after operation, were included in the datasets. Based on the research literature and doctors’ recommendations, 23 features that are relevant to, and have a significant impact on, the survival and prognosis of patients were selected from these data as the clinical features of patients with gallbladder cancer. These included age, sex, prothrombin time (PT), international normalized ratio (INR), activated partial thromboplastin time (APTT), fibrinogen (Fib), thrombin time (TT), white blood cell (WBC) count, hemoglobin, platelets, alpha-fetoprotein (AFP), carcinoembryonic antigen (CEA), tumor markers CA-199, CA-125, CA-724, tumor staging (TNM), gallstone, liver resection status, R0 resection of tumor (R0), differentiation, nerve infiltration, microvascular infiltration, and radiation and chemotherapy. The clinical datasets used in this study were established using these 23 components, as collected from patients included in the study. The overall data features are summarized in Table 2. All clinical laboratory data were collected on the first day of admission, all data regarding systematic treatments were obtained after the operation of the patient, and TNM staging information was obtained 1 week after the operation.

Table 2 Features of the clinical datasets

Experimental Platform

Training and operation of the deep feature model in this experiment were both conducted using a Linux Ubuntu 16.04 (Canonical Ltd., London, UK) system environment. The adopted deep learning framework was the PyTorch deep learning library (Facebook’s AI Research lab (FAIR), NY), based on Python 3.8 software (Python Software Foundation, Wilmington, DE). The graphics processing unit (GPU) used was an NVIDIA TITAN RTX with 64G VRAM and 32G RAM, and CUDA Toolkit 10.1 (NVIDIA Corporation, Santa Clara, CA, USA). The Cox proportional hazard regression model was established using R version 4.1.0 (The R Foundation, Vienna, Austria), for programming in a 64-bit Windows system.

Data Preprocessing

Data Cropping

Abdominal CT scan data are three-dimensional volume data. The area occupied by the gallbladder is small, and the presence of other organs might affect the learning and judgment of the deep learning model. Therefore, patients’ CT images had to be cropped first to maintain the gallbladder area, as shown in Fig. 2.

Fig. 2
figure 2

Image cropping

Labeling of the gallbladder and surrounding lesions in the CT images was performed by two experienced imaging physicians. When cropping CT images, all sizes of the gallbladder area in the dataset marked by physicians were compiled first, and the size of the three-dimensional CT volume data of each patient was selected as 224 × 224 × 32.

Normalization

Because the CT value of the gallbladder is between 10 and 30 HU, the best window width and window level for gallbladder display in the CT image were set to 300 and 30, respectively, and the cropped images were normalized according to these values, as shown in Eq. 1.

$${\text{Volume}}_{{{\text{normal}}}} = {\text{Volume}} - \frac{{{\text{Window\_Center}} - 0.5 \times {\text{Window\_Width}}}}{{{\text{Window\_Width}}}}$$
(1)

where \({Volume}_{normal}\) represents the image voxel value, whereas \(Window\_Width\) and \(Window\_Center\) represent the window width and level, respectively. CT values that exceeded the upper and lower boundaries were cropped; those exceeding the lower boundary were considered as 0, and those exceeding the upper boundary as 1. Subsequently, patients’ CT images with voxel values between 0 and 1 were obtained, as shown in Fig. 3.

Fig. 3
figure 3

Image normalization

Data Enhancement

Owing to the insufficient scale of the gallbladder cancer dataset, a data augmentation strategy was implemented to expand the dataset in question. This study randomly selected 60 deceased versus 60 alive cases from the dataset, which were divided into two groups according to the survival status (alive and dead) of the patients after surgery. Data enhancement strategies include translation, rotation, stretching, zooming, horizontal and vertical flipping, and adding noise. Of these, stretching, zooming, and flipping would significantly alter the shape information of medical images, which was not suitable for gallbladder cancer image analysis. Therefore, the data enhancement strategy adopted in this study involved translation and rotation around the origin. A new image was obtained by randomly shifting a patient’s image horizontally and vertically, by –5 to 5 units and rotating the center by –10° to 10°. Data enhancement resulted in a fourfold increase—from 120 (60 deceased versus 60 alive cases) to 480—in the number of patient samples, which were used to train the deep learning model.

Clinical Data

Certain clinical features in the dataset collected from patients with gallbladder cancer had missing values, owing to differences in patient admission time and status. Less than 5% of values in each feature were missing; therefore, the multiple imputation method was used to address this part of the data. Multiple imputation methods typically include multiple regression imputation, propensity score matching, and the Markov chain Monte Carlo method [14]. Multiple regression imputation and propensity score matching are suitable when the missing data exhibit monotonicity, whereas the Markov chain Monte Carlo method is mostly used for continuous variable data with arbitrary missing patterns [15]. Clinical data missing in the clinical datasets used in this study were generally arbitrary, so the Markov chain Monte Carlo method was mainly used for imputation.

Deep Image Feature Extraction Based on 3D-DenseNet

Owing to the limitation of the medical dataset’s size, the transfer learning method was adopted to fine-tune the network structure of the deep learning DenseNet model [16] while constructing the deep learning model framework, which has performed well in the field of everyday image processing. Everyday image data are usually two dimensional with three channels, and the DenseNet model is also constructed on this basis. However, medical imaging data are three-dimensional and, after normalization, the image is single channel, between 0 and 1. Therefore, the DenseNet model was recoded into 3D-DenseNet to extract the features of the medical images. This study proposed a 3D-DenseNet model designed for medical imaging, with the specific structure shown in Fig. 4.

Fig. 4
figure 4

3D-DenseNet model structure

During the training process, the grid search method was used to search for hyperparameters and compare performance of the models with different hyperparameter values, through fivefold cross-validation. The hyperparameters used in the final training of the 3D-DenseNet model are listed in Table 3.

Table 3 3D-DenseNet model training parameters

On the basis of these hyperparameters, the model achieved its best performance on the training set, when the training round epoch was 96. The prediction accuracy rate and the loss predicted by the model result were 0.96 and 0.02, respectively, whereas performance on the validation set was also relatively good, with a prediction accuracy rate and loss of 0.97 and 0.016, respectively. Therefore, the final selected 3D-DenseNet model was the model obtained at training round epoch 96.

After obtaining the optimal 3D-DenseNet model for the survival and death classification of patients with gallbladder cancer—using all CT images of gallbladder cancer patients—the output layers of the two network models and the last fully connected layer, respectively, were removed. Then, the tiled feature vector output by the model was represented as the features extracted from the CT images. For each CT image, the number of imaging features extracted by 3D-DenseNet was 1024, denoted as X1–X1024.

Feature Screening on the Basis of Cox Single-Factor Analysis

Image Feature Screening

Cox single-factor analysis was conducted on the 1024 arterial phase, enhanced CT deep imaging features extracted by the 3D-DenseNet model, with P < 0.05 as threshold. Thereafter, screening for features that have a higher correlation with patient survival was carried out. Remaining features were integrated based on the Cox proportional hazard regression model, the imaging feature prognostic factor for patients with gallbladder cancer was established, and the role of imaging features in predicting and indicating the prognosis and survival of patients with gallbladder cancer was analyzed.

Because deep imaging features were abundant and their dimensionality was high, it was necessary to further filter the feature variables when performing Cox regression. Therefore, the L1 penalty term was introduced into the Cox model, and the regression coefficient β was decreased to zero for features with a small β, thereby removing them from the model. The λ value in the lasso penalty term was selected using the value of λ when the partial likelihood estimation error of the model was the smallest, and lasso filtering was performed on the imaging deep features extracted by 3D-DenseNet. The remaining feature variables were X32, X118, X130, X150, X490, X832, X853, and X990. A Cox regression model was constructed for the selected feature variables, of which the corresponding regression coefficients are presented in Table 4.

Table 4 Imaging feature prognosis factor based on 3D-DenseNet

Clinical Feature Screening

The impact of each of the 23 clinical features in the gallbladder cancer clinical dataset on patient survival and prognosis was analyzed together with patients’ survival information. Cox single-factor regression was used to evaluate whether the relationship between each clinical feature and patient survival time was significant. Meanwhile, the gallbladder cancer clinical dataset was divided into high- and low-risk groups on the basis of the median of each clinical feature, and survival curves were drawn for the two groups of patients. Then, the log-rank test was used to compare the two sets of survival curves and analyze whether a particular clinical feature had a significant impact on patient survival.

Cox regression analysis, also known as the Cox proportional hazard regression model, is an important survival analysis technique. It considers patients’ survival outcome and survival time as dependent variables, and one or more factors can be simultaneously used as independent variables, to analyze their impact on survival time [17]. At its core, it is a regression model that can be divided into single-factor regression—mainly used to screen variables—and multifactor regression. The results of the Cox univariate regression analysis of the 23 clinical features are presented in Table 5. At a confidence interval of 0.95, the threshold for selecting the P-value was 0.05, and clinical features with a P-value below this threshold had a significant impact on patients’ survival time.

Table 5 Cox single-factor analysis results of clinical features

Carcinoembryonic antigen (CEA), carbohydrate antigen 19–9 (CA-199), and carcinoma in situ (T0, TNM_0) each had a Cox single-factor analysis P-value below 0.05, indicating that these three clinical features had a significant impact on patients’ survival time.

Deep Imaging Feature Prognostic Factor Analysis

A nomogram can illustrate the functional relationship between survival time and multiple variables, whereas a nomogram model portrays the relationship between the deep imaging feature prognostic factor and the deep imaging itself. The imaging feature prognostic factor nomogram established in this study, based on the features extracted by 3D-DenseNet, is shown in Fig. 5.

Fig. 5
figure 5

Deep imaging feature prognosis factor nomogram based on features extracted by 3D-DenseNet

Survival Prediction Model Based on Multimodal Data Integration

On the basis of the Cox proportional hazard regression model, imaging prognostic factors and clinical features can be integrated to simultaneously analyze their combined effect on the survival time of patients with gallbladder cancer. The weight of each feature in the multimodal data survival prediction model of gallbladder cancer, constructed using the features extracted by 3D-DenseNet, is presented in Table 6, while the corresponding nomogram is shown in Fig. 6.

Table 6 Features and coefficients of the multimodal survival prediction model
Fig. 6
figure 6

Nomogram of the deep feature multimodal survival analysis model based on features extracted by 3D-DenseNet

Results

Survival Prediction Results Based on Imaging Data

The C-index of the independent prognostic factors established using 3D-DenseNet reached 0.758. The survival curves constructed using this prognostic factor are shown in Figs. 7 and 8. In the training set, the log-rank test was used to distinguish the high-risk group from the low-risk group, and at the 0.95 confidence level, the P-value was less than 0.0001, indicating that high-risk patients could be distinguished from low-risk patients and that the result was significant. Meanwhile, in the validation set, the P-value of the model was less than 0.0001, indicating that the effect of the model on the validation set was also significant. When this prognostic factor was used to predict the survival rate of patients with gallbladder cancer, it also performed well in the short term, as shown in Fig. 9. In the training set, the area under the receiver operating characteristics (ROC) curve of the 1-year survival rate reached 0.820. However, the accuracy of the model increased when making long-term predictions. The 3- and 5-year survival prediction accuracy in terms of the area under the curve (AUC) were 0.877 and 0.929, respectively. In the validation set, shown in Fig. 10, the 1-year survival rate AUC was 0.802, and the 3- and 5-year survival rate AUC was 0.849 and 0.913, respectively.

Fig. 7
figure 7

Performance of the 3D-DenseNet deep imaging prognosis factor on the training set

Fig. 8
figure 8

Performance of the 3D-DenseNet deep imaging prognosis factor on the verification set

Fig. 9
figure 9

ROC curve of the 3D-DenseNet deep imaging prognosis factor on the training set

Fig. 10
figure 10

ROC curve of the 3D-DenseNet deep imaging prognosis factor on the verification set

Survival Prediction Results Based on Multimodal Data Model

Herein, the deep imaging features extracted by 3D-DenseNet could significantly distinguish between the high- and low-risk groups of patients with gallbladder cancer, concerning their postoperative survival risk. Accordingly, the 3D-DenseNet deep imaging feature prognostic factors were used, alongside clinical features, to establish a multimodal survival prediction model for patients with gallbladder cancer, based on the Cox regression model. This model was used to predict patient survival (Fig. 11), for which a corresponding ROC curve was drawn, as shown in Fig. 12. Compared with the 3D-DenseNet imaging prognostic factors before integrating clinical information, the model’s performance in terms of 1-, 3-, and 5-year survival predictions improved, reaching 0.827, 0.865, and 0.926, respectively. To some extent, this compensated for the low accuracy of 3D-DenseNet imaging prognostic factors in long-term prediction. Simultaneously, the C-index of the model increased by 0.03, reaching 0.787, showing good application value and prospects.

Fig. 11
figure 11

Predicted patient survival curve of the multimodal survival prediction model

Fig. 12
figure 12

ROC curve of the multimodal survival prediction model

Discussion

Gallbladder cancer is characterized by insidious onset, late discovery, and poor prognosis. A scientific and reasonable survival prediction model could not only effectively indicate the course of disease but also guide treatment strategies. Many clinical factors—mainly perceived through intra- and postoperative observations—affect the postoperative survival of patients with gallbladder cancer. These include tumor TNM staging and differentiation, liver infiltration, concurrent jaundice, and the scope of lymph node dissection. Clinicians mostly make judgments based on experience, resulting in major differences in determining the survival time of a single case. This study aimed to improve the survival time determination of patients with gallbladder cancer, in two ways: First, by obtaining as much clinical data as possible from patients, using clinical laboratory tests, imaging information, and systemic treatment data, we could establish a survival prediction model. Second, we attempted to establish an objective and standardized computerized prediction model. The survival time of patients with gallbladder cancer could be predicted.

In this study, the 3D-DenseNet CNN was used to extract deep features from patients’ CT images, whereas Cox single-factor analysis and lasso regression screening were performed to construct the deep feature-independent prognostic factors of patients with gallbladder cancer. These prognostic factors performed relatively consistently on the training and validation sets, producing good survival prediction results for relevant patients. The features extracted from each 3D-DenseNet layer could be connected directly to the subsequent layers, and the output feature superimposition of each layer was the direct superimposition of the feature map in the dimension of the channel. Therefore, the final outputs of the model retained the simpler, global features extracted from the first few layers; however, they also contained complex features extracted from the deeper layers. As seen from the experimental results, the performance of the prognostic factors constructed from the extracted deep imaging features was relatively consistent with strong anti-interference ability.

Comparison between the imaging deep features, independent prognostic factors, and multimodal survival prediction model that integrated clinical information revealed that the multimodal model had a higher C-index and AUROC. This showed that implementation of the multimodal data integration strategy proposed in this study could incorporate complementary contextual information and overcome the limitations of using only imaging models.

In comparison, the C-index of the tumor TNM staging system that is currently widely used to predict patient survival is only 0.569. Wang et al. [6] constructed a model using adjuvant radiotherapy as the main prognostic factor, of which the C-index was 0.71, while the same metric for the multimodal survival prediction model established in this study reached 0.787. On one hand, TNM staging mainly depends on the depth of tumor invasion in organs, the presence of vascular invasion, the extent of lymph node metastases, and distant metastatic conditions; hence, it ignores many other imaging features and tumor biological behavior and thereby fails to make predictions with higher accuracy. On the other hand, radiomics adopts a high-throughput approach to extract several tumor-describing imaging features, which are analyzed using omics methods, making the data more comprehensive and the prediction more accurate. Similar C-indexes to the present study such as those by Zhang et al. [5] and Bai et al. [18], were mainly achieved by constructing models using features obtained from the pathological analysis of patients with gallbladder cancer, such as the number of lymph nodes involved and the tumor resection margin but without consideration of imaging data. Therefore, the multimodal survival prediction model for gallbladder cancer established in this study has considerable potential for application in clinical practice. The deep learning-based, multimodal data integration strategy also has important significance for guiding and inspiring future research on prediction models for gallbladder cancer survival.

The results included in the model revealed four categories of main factors: one is a collection of parameters based on radiomics, whereas the other three factors are CEA, CA-199, and TNM_0.

CEA is an acid glycoprotein with complex structure that mainly exists in adult cancer tissues and fetal gastrointestinal tract tissues, and is a broad spectrum tumor marker. It is clinically useful for auxiliary diagnosis of various types of common tumors. CA-199 is currently a commonly used tumor marker for the detection of pancreatic cancer in the clinic. It is often used in auxiliary diagnosis of malignant tumors such as pancreas or biliary tract, but its specificity is not strong enough when used alone. Gallbladder cancer is the most common biliary tract malignancy, and radical surgery is still the most effective treatment. The results showed that T stage is a prognostic factor in patients with gallbladder cancer, and the prognosis of patients with each T stage is significantly different. The depth of tumor invasion in patients with Tis (that is, T0) is limited to the mucosal layer, and the prognosis of patients after radical resection is good. From accumulated clinical experience, the levels of CEA, CA-199, and TNM_0 are known to play a important role in determining patient survival.

Historically, information provided by imaging is not very relevant to determining the survival time of patients. However, this study showed that the parameters extracted from imaging accounted for a very high proportion of our model, refuting the long-held belief that imaging has a limited role in determining the survival of patients. In our model, the AUC related to prediction of patients’ 1-, 3-, and 5-year survival rates reached 0.827, 0.865, and 0.926, respectively, showing good consistency and accuracy.

However, this study also had several limitations: (1) Few research subjects were included, which was mainly influenced by the incidence of gallbladder cancer. (2) The radiomics meaning of each parameter included in the model could not be accurately explained owing to the limitations of the deep learning method itself. However, this does not prevent us from using this method to make predictions. We have developed software that can automatically extract deep image features and can help us complete this task. The software can automatically calculate a patient’s survival rate by importing the data required by the multimodal nomogram model. (3) Whether other clinical laboratory examinations can be used to improve survival prediction remains to be investigated. (4) There were missing values in the clinical laboratory data and systemic treatments data we collected, so we had to use an interpolation method to process these missing data. After comparing a variety of interpolation methods (such as linear, cubic, random forest, classification and regression trees, and random sample from observed values), we selected the Markov chain Monte Carlo method for imputation, but this still led to a decline in the prediction accuracy of the model. In the future, we will continue to collect more and more comprehensive data without missing values. (5) We are still conducting a cross-sectional study on patients with gallbladder cancer, that is, we only collect their clinical data and image data once, which lack disease information along the temporal dimension and the evolution of the patients’ disease course. The predictions of the model would be more accurate in the future if this part of the information could be incorporated.

Conclusions

This study proposed a method to establish an individualized, noninvasive survival prediction model for patients with gallbladder cancer, based on multimodal data. Compared with the monomodal imaging prognostic factor, the survival prediction model established using multimodal data was more accurate in predicting the survival of patients, and the C-index increased by approximately 0.03. In the future, we will attempt to incorporate more laboratory clinical data and modal imaging data into the model, to further improve its prediction accuracy.