Introduction

Coronavirus disease 2019 (COVID-19) is a pandemic [1]. COVID-19 has spread worldwide and has led to millions of deaths. According to the Chinese Health Commission (CHC) guidelines, COVID-19 severity is classified as mild, moderate, severe, or critical [2]. The Chinese Center for Disease Control and Prevention reported that 81% of COVID-19 cases were non-severe, and the remaining 19% were severe or critical [3]. Existing epidemiological studies suggest that the mortality rate of patients with severe COVID-19 is more than ten times higher than that of patients with non-severe COVID-19 [4]. To treat patients with COVID-19, early identification of severe cases directly influences treatment and prevents clinical deterioration. Similarly, early identification and management of patients with severe COVID-19 prevent disease progression and improve survival [5].

According to recent experience, abnormal findings on lung imaging appear before clinical symptoms develop, which highlights the importance of lung imaging in screening for COVID-19 pneumonia [6]. Computed tomography (CT) is helpful for COVID-19 diagnosis and in assessing COVID-19 pneumonia progression [7, 8]. The typical findings on chest CT imagery for patients with COVID-19 are ground-glass opacities and bilateral lung consolidations with peripheral involvement [9]. However, the evaluation of these conventional textures varies among radiologists and is often subjective.

Computed tomography radiomics, a non-invasive developing machine learning technology, can extract histograms, shapes, or textural features from images. In addition, artificial intelligence can further quantify textural information using mathematical analysis; therefore, abnormal lesions on CT images can be evaluated precisely and objectively using radiomics. Recently, CT-based radiomics has been widely used for tumor diagnosis, cancer treatment, and prognosis assessment [10, 11].

In previous studies on COVID-19, machine learning CT-based radiomics has been shown to help diagnose and differentiate COVID-19 pneumonia from pneumonia caused by other pathogens [12,13,14]. Additionally, CT-based radiomics reportedly predicts the severity and outcome of COVID-19 pulmonary opacities [15]. However, the mechanism between COVID-19 pneumonia severity, pulmonary opacities, and clinical manifestations has not been well addressed, and a detailed meta-analysis using CT-based radiomics has not been performed. Therefore, this study aimed to investigate whether CT-based radiomics models can predict COVID-19 pneumonia severity.

Materials and methods

Study protocol and literature search

This study followed the diagnostic version of PRISMA guidelines [16]. Two investigators searched PubMed, Embase, the Cochrane Central Register of Controlled Trials and the Cochrane Database of Systematic Reviews for articles published between the inception of the databases until July 16, 2021. The keywords used were as follows: (“COVID-19” OR “severe acute respiratory coronavirus-2[SARS-CoV-2]”) AND (“radiomics” OR “textural”) AND (“computed tomography” OR “CT”).

Literature selection criteria

The inclusion criteria were as follows:

  1. 1.

    Studies using shape- and texture-based radiomics to predict COVID-19 severity.

  2. 2.

    Studies wherein COVID-19 severity was defined according to the CHC guidelines.

  3. 3.

    Studies with full text available.

  4. 4.

    Studies published in the English language.

In contrast, the exclusion criteria were as follows:

  1. 1.

    Studies wherein radiomics was not used to predict the severity of COVID-19.

  2. 2.

    Conference posters or papers for which only the abstract was available.

COVID-19 pneumonia severity classification

According to the CHC guidelines, COVID-19 illness is classified according to disease severity [4]. Patients with COVID-19 pneumonia included in this study were classified into those with non-severe disease (non-SVD) and those with severe disease (SVD). Patients who met any of the following criteria were included in the SVD group: (1) respiratory rate ≥ 30 times per minute, (2) oxygen saturation ≤ 93% by finger oximetry at resting status, (3) partial pressure of oxygen in arterial blood (PaO2)/fraction of inspired oxygen (FiO2) ≤ 300 mmHg), (4) patients with > 50% lesion progression on chest imaging over 1–2 days, (5) respiratory failure and assisted ventilation requirement; (6) shock, or (7) organ failure that required admission to the intensive care unit (ICU).

Data collection

We extracted the true-positive, false-positive, false-negative, and true-negative rates from the literature. The radiomics model with the highest area under the receiver operating characteristic curve (AUC) within the articles was used for extraction. Some studies used bootstrapping or cross-validation; therefore, the resulting values were not integers that could be used for extraction. For simplicity, we rounded the figures used in the calculations. Additionally, we extracted other information from the literature, including the author details, publication year, nation, number of patients, and further information.

Statistical analysis

The pooled sensitivity and specificity of the included radiomics studies were determined using statistical analysis. The pooled results are presented as forest plots. The overall predictive power was calculated by creating a summary receiver operating characteristic (SROC) curve. We evaluated the heterogeneity of the included literature by visually investigating the SROC curve [17]. The analysis was conducted using the R language [18], R package (Mada [19] and Meta [20]), and R studio [21].

Bias and study quality assessment

The publication bias was evaluated using a funnel plot. The quality of the included studies was assessed using the radiomics quality score (RQS) [22] and quality assessment of diagnostic accuracy studies (QUADAS-2) tool [23]. The RQS assessment investigated 16 components, which resulted in a score ranging from − 8 to 36, defined as 0% and 100%, respectively. The QUADAS-2 tool, which assesses seven components, was used to evaluate the risk of bias and applicability concerns. Two authors independently scored the RQS and QUADAS-2 tools. If a discrepancy was observed, the final score was discussed by the two authors to reach consensus.

Results

We retrieved a total of 682 articles. After removing duplicates, 118 articles were selected for evaluation. After screening for eligibility based on titles and abstracts, 12 articles were retrieved for complete evaluation. Four studies were excluded from the analysis as follows: one observational study [24], which used a repetitive patient population, one observational study [15], which used pulmonary opacities on chest images to predict disease severity, and two observational studies [25, 26], which used other severity assessment protocols to predict disease outcome. Finally, eight articles were used for qualitative analysis [27,28,29,30,31,32,33,34]. Only seven reports were included in the meta-analysis as a study by Li et al. [34] was excluded because only patients with severe COVID-19 were included in the report. A flowchart of the literature review is shown in Fig. 1. The details of the selected studies are presented in Table 1.

Fig. 1
figure 1

A flowchart illustrating the inclusion process used to identify studies

Table 1 Characteristics of the selected studies

Pooled analysis of the included studies

Seven studies comprising 1460 patients with COVID-19 were included in this meta-analysis. The forest plot of pooled sensitivity was 0.800 (95% confidence interval [CI] = 0.662–0.891), as shown in Fig. 2. The forest plot of pooled specificity was 0.874 (95% CI = 0.773–0.934), as shown in Fig. 3. The pooled AUC was 0.908, and the SROC curve is shown in Fig. 4. We identified the heterogeneity within the included studies by visually investigating the SROC curve.

Fig. 2
figure 2

The forest plot for sensitivity

Fig. 3
figure 3

The forest plot for specificity

Fig. 4
figure 4

The SROC curve

SROC, summary receiver operating characteristic curve; conf. region, 95% confidence region for the SROC curve.

Radiomics quality score of the included studies

The radiomics quality scores of the included studies are presented in Table 2. The radiomics quality scores ranged from 7 to 16. After a detailed evaluation of each RQS component by two authors, all included studies presented their image protocols, feature reduction performance, discrimination statistics reports, a comparison of the results to the gold standard, and potential clinical utility.

Table 2 Radiomics quality scores of the selected literature

Qualities assessment of the selected literature

The QUADAS-2 tool was used to evaluate the literature. All studies had at least five out of seven low-risk bias assessment points. The results are presented in Fig. 5.

Fig. 5
figure 5

Quality assessment of diagnostic accuracy studies

Publication bias assessment of the included studies

The funnel plot is shown in Fig. 6. As the number of included studies was less than 10, we cannot conclude whether a publication bias exists.

Fig. 6
figure 6

Funnel plot

Review of the radiomics and clinical features used in the included studies

As stated by the IEEE International Symposium on Biomedical Imaging, there are many types of texture features, including first-order texture features, shape-based texture features, gray-level distance-zone matrix texture features, gray-level size-zone matrix texture features, neighborhood gray-tone difference matrix texture features, neighboring gray-level dependence matrix texture features, gray-level run-length matrix texture features, and gray-level co-occurrence matrix texture features [35]. The types of textural features used in the included studies are listed in Table 3. Four studies used shape-based radiomics features, six studies used first-order radiomics features, and five studies used second-order radiomics features.

Table 3 The type of radiomics and non-radiomics features used in the selected studies

Review of the prediction algorithms used in the included studies

Three selected studies used the least absolute shrinkage and selection operator (LASSO). One of the included studies used the XGBclassifier. Two of the studies used the random forest method. The other two studies used logistic regression, and the details of the prediction algorithms are listed in Table 4.

Table 4 The prediction algorithms used in the selected studies

Discussion

Our meta-analysis revealed that CT-based radiomics could be used to predict the severity of COVID-19 pneumonia. In other CT-based radiomics studies, different COVID-19 pneumonia severity protocols could predict the severity of COVID-19 pneumonia [25, 26]. The management of COVID-19 pneumonia depends on disease severity [38, 39]. Therefore, early prediction of severe COVID-19 pneumonia before clinical deterioration using CT-based radiomics may aid in providing early management for these patients and reduce mortality [5, 40].

Our study included 1460 patients. The pooled sensitivity and specificity were 0.800 (95% CI = 0.662–0.891) and 0.874 (95% CI = 0.773–0.934), respectively. The pooled AUC was quite high at 0.908, indicating that radiomics is a promising tool for predicting the severity of COVID-19 pneumonia. The heterogeneity within the included studies may be attributed to the properties of radiomics features. As a previous study implied, radiomics features could be influenced by the calculation kernel, tumor delineation variability, technical settings of the CT scan, and software used to produce radiomics features [41]. This meta-analysis pooled results from various studies with different settings, thus providing robust results.

The RQS assessment resulted in a score ranging from −8 to 36, defined as 0% and 100%, respectively. The RQS values of the included literature ranged from seven to 16; thus, the highest RQS in the selected studies was only 40%. A previous meta-analysis also found a maximum RQS score of 16 for CT-based texture features used to differentiate between COVID-19 and viral pneumonia [14]. Compared with this study, a low RQS score makes it challenging to conduct a high-quality radiomics study in current research settings.

In contrast, the QUADAS-2 tool showed a favorable quality assessment of the selected studies. The risk of bias was primarily low in the selected studies, except for the patient selection bias. The patient selection bias was unclear or high because the selected studies were retrospective, and the patients were not randomly enrolled. The concern of applicability rating was low because the patient and index test interpretations were suitable for our review of the selected studies.

The types of radiomics features used in the selected studies should be discussed. While six studies assessed first-order features, five studies assessed second-order features, either alone or in combination with other features. Second-order features have been widely used in radiomics models for cancer patients, as they measure the heterogeneity within the region of interest. Hence, future studies investigating the molecular mechanisms associated with second-order radiomics features are warranted to deepen the understanding of COVID-19.

The algorithms used significantly varied between the selected studies. The most frequently used algorithm was the LASSO. The LASSO algorithm is a logistic regression-based algorithm that adds a regularization term to reduce the effect of noise on prediction. Another study used the XGBclassifier, a tree-based prediction algorithm that starts with a weak classifier and subsequently boosts to a stronger classifier [42]. Two of the included studies used the random forest method, another tree-based classifier, which starts with a robust classifier and reaches the final prediction result by voting [43]. The other two studies used traditional logistic regression models.

This meta-analysis had some limitations. First, the articles selected for this meta-analysis were retrospective. Second, the study protocols for each article were conducted in China, which can be attributed to our use of the CHC guidelines for COVID-19 pneumonia severity classification. Third, as this meta-analysis focused on predicting COVID-19 pneumonia severity using a CT-based radiomics learning model, the patients’ clinical data and disease course spectrum were not analyzed further. Although CT-based radiomics models were helpful for predicting COVID-19 pneumonia severity, the equivalence of pneumonia severity prediction to the prognosis and mortality prediction was not investigated in this meta-analysis. Therefore, future prospective and multicenter research should be performed to verify the effectiveness of radiomics in predicting COVID-19 pneumonia severity.

Conclusions

Our meta-analysis demonstrated that CT-based radiomics feature models might be powerful tools for predicting the severity of COVID-19 pneumonia.