Dear Editor

We read with great interest the article by Girshausen R et al. [1]. The authors validated and compared the accuracy of nine scores in predicting the prognosis of severely injured trauma patients, and the study concluded that RISC II was the best predictor of mortality with an AUROC (area under the receiver operating characteristic curve) of 0.92, while APACHE II, SOFA, and Marshall scores are still helpful tools with an AUROC range of 0.69 to 0.81. In addition, ISS, NISS, RTS, EAC, and PTGS scores provided poorer mortality prediction with an AUCOC range of 0.57–0.66.The author should be commended for his choice of topic and workload. After reading this article carefully, we have some suggestions.

First, in view of the very high heterogeneity of polytrauma, it is necessary to use a larger sample when externally validating and comparing different prediction models [2]. If the authors had considered the estimated sample size, their conclusions could have been more persuasive.

Second, selected comparisons of AUROCs could have been tested for statistical significance using DeLong’s test [3].

Third, discrimination is the only indicator of the quality of these scores assessed in this study and it may need to be supplemented with calibration and decision curve analysis (DCA). For the present, it is recommended that these three indicators be employed to comprehensively evaluate the predictive models [4, 5].