Rectal adenocarcinoma is a common clinical malignant tumor that is reasonably common in developed countries, including those in North America and Europe.[3, 4] Tumor metastasis is reportedly present in more than 50% of newly diagnosed patients, which is due to the atypical clinical symptoms of early-stage rectal adenocarcinoma.[7] Effective methods for the early detection and early treatment of rectal adenocarcinoma would therefore be of great significance for improving the prognosis of affected patients. Various risk factors affecting the prognosis of these patients have been reported in recent years, including age, sex, histological type, tumor stage, and tumor differentiation status.[33, 34]
With the aim of improving the accuracy of survival-time predictions for patients with rectal adenocarcinoma, various methods have been used to establish prediction models, including the AJCC TNM staging system, logistics regression analysis, and the Cox proportional-hazards model.[15–18] Each of these prediction models has certain advantages and disadvantages, and different models produce different predictions of patient survival. The Cox proportional-hazards model is currently one of the most widely used models for prognostic predictions,[26] and such models require each predictor variable to be a linear factor, which therefore ignores the impacts of any significant nonlinear factors on outcome variables. It is well known that the development of tumors and changes therein are affected by many factors, and so traditional linear models are highly unlikely to accurately predict the prognosis of cancer patients. This situation makes it necessary to develop new methods that can combine linear and nonlinear factors in the construction of prediction models.
The ongoing developments in computer and information technology can facilitate the construction of the required novel predictive models. For example, Katzman et al. implemented the DeepSurv analysis method by combining deep learning with a multilayer neural network.[31] The DeepSurv method includes a complex three-layer network structure comprising input, hidden, and output layers.[29] The input layer includes each linear or nonlinear predictor variable, the hidden layer has a multilayer structure for variable conversion, and the output layer is the converted target variable. The DeepSurv method uses deep-learning technology to convert multiple linear and nonlinear factors into a linear combination via multilevel fusion and transformation to predict outcome events. The DeepSurv approach is being gradually applied in various fields related to biomedical research. Multiple research results have shown that the predictions made using the DeepSurv model are better than those made using traditional linear prediction models.[35–37] She et al. used a DeepSurv model to provide non-small-cell lung-cancer-specific survival and prognosis predictions as well as treatment recommendations, and found that its prediction effect was significantly better than that of the traditional AJCC TNM staging system.[38] Biglarian et al. demonstrated that the DeepSurv model is superior to the Cox proportional-hazards model in predicting distant metastasis in patients with rectal cancer.[39] Rau et al. found that a DeepSurv model for predictions associated with liver cancer was superior to those obtained using a logistic regression model.[40]
This study constructed a DeepSurv model of the survival rate of rectal adenocarcinoma patients by collecting affected patients living in the United States from the SEER database. We first conducted a Cox proportional-hazards regression analysis of 34,492 patients with rectal adenocarcinoma in the training cohort to identify risk factors for their prognosis. These risk factors were age, race, sex, marital status, tumor grade, AJCC TNM stage, surgery status, chemotherapy status, tumor size, and degree of tumor invasion (p < 0.05) (Table 1). We then developed a seven-layer neural-network DeepSurv prediction model based on the analytical method established by Katzman et al.[31] The C-index when applying the new prediction model was 0.821 for the test cohort and 0.824 for the training cohort. These values show that the predictions of the DeepSurv model for the test-cohort patients are highly consistent with those for the training-cohort patients. The results obtained for the calibration curves of the patients in the test cohort at 3, 5, and 10 years further support this conclusion. The DeepSurv model was also found to provide more accurate predictions of the prognosis of patients with rectal adenocarcinoma compared with the Cox proportional-hazards model, which is consistent with the results of some previous studies of cancer prognoses. It has also been shown previously that the DeepSurv model provides powerful variable-processing capabilities.[35, 41] Finally, we compared the DeepSurv prediction model with the AJCC TNM staging system, and found that the AUC was higher for the former (AUC = 0.800) than the latter (AUC = 0.755). Meanwhile, the survival curve was smoother for the DeepSurv model than for the AJCC TNM staging system. The superior results for the survival prognosis of patients with rectal adenocarcinoma obtained by applying the DeepSurv model are due to it transforming linear and nonlinear predictive variables into a linear combination by utilizing a multilevel neural network.[31] Deep learning can be used to solve nonlinear problems involving multiple factors, and so the DeepSurv model has particular advantages over other models when dealing with large samples, multiple variables, and nonlinearity.
The present study was subject to some limitations. First, some potentially information that might affect survival was missing for the patients with rectal adenocarcinoma collected from the SEER database, such as whether tumors were surgically removed, the type of chemotherapy applied, medications, the psychological status, religious beliefs, and education of the patients, and their familial tumor history. Second, our study only included data for patients with rectal adenocarcinoma living in certain parts of the United States, and the established DeepSurv prediction model was not validated using external data. The accuracy of the DeepSurv approach could be further assessed using patients with rectal adenocarcinoma living in other countries. Third, the DeepSurv model has its own inherent limitations during the construction process. The existence of hidden layers in the black-box model meant that we cannot exactly understand the calculations performed during the model construction process, or the associated limitations. Future studies should attempt needed to resolve the above-mentioned problems.