Introduction

The ongoing coronavirus disease 2019 (COVID-19) pandemic, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global threat [1,2,3]. Early and quick diagnosis of COVID-19 is crucial for isolating patients, preventing infection spread, and early patient intervention [4].

Currently, the reference standard for COVID-19 is the reverse transcription polymerase chain reaction (RT-PCR) test using respiratory tract specimens [5]. However, RT-PCR tests have limited sensitivity depending on symptom duration [6], viral load [7, 8], test sample types [9], delays in test access or processing times [10], and insufficient testing capacity [11].

Computed tomography (CT) is a widely available, time-saving, and non-invasive approach for detecting COVID-19 [12]. It could be an efficient tool for diagnosing COVID-19 with high sensitivity and monitoring disease course [10, 13,14,15]. Accordingly, it has been recommended for identifying patients with COVID-19 [16,17,18]. Nevertheless, the use of CT scanning for diagnosing COVID-19 given the lack of accurate, consistent evaluation across radiologists and medical centers is controversial [19]. There has been heterogeneity among previous studies on chest CT for detecting COVID-19, with sensitivity and specificity of 0.19 to 1.00 [20] and 0.25 to 0.70 [21], respectively. This was mainly attributed to the lack of a specific reporting system in various centers with different radiologists [20,21,22,23].

The COVID-19 Reporting and Data System (CO-RADS) was established by the Dutch Radiological Society to standardize the assessment scheme and simplify reporting using a 5-point scale of suspicion (from 1 [very low suspicion] to 5 [very high suspicion]) for COVID-19 pneumonia on chest CT images [24]. CO-RADS has excellent diagnostic performance for COVID-19 through interpretation of chest CT images, with moderate to substantial interobserver agreement [24,25,26]. A meta-analysis on the diagnostic performance of CO-RADS in detecting COVID-19 [27] only included six studies up to August 24, 2020, with a small sample size. This systematic review and meta-analysis aimed to provide an updated overview of the diagnostic performance of CO-RADS in COVID-19, with a focus on the possible practical application of CO-RADS.

Materials and methods

This meta-analysis was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis [28], with a standardized review and data extraction protocol. The primary study outcome was the diagnostic performance of CO-RADS for COVID-19. Additionally, we discussed the clinical application of CO-RADS.

Search strategy and selection criteria

A computerized search of PubMed, EMBASE, MEDLINE, Web of Science, Cochrane Library, and Scopus was performed up to September 21, 2021, without restrictions regarding regions or publication types. We manually reviewed the reference lists of all retrieved studies, review articles, and conference abstracts to identify additional potential studies. We used the following search terms: ([COVID-19 reporting-and-data system] or [COVID-RADS] or [CO-RADS]) AND ([COVID-19] or [SARS-CoV-2] or [coronavirus disease 2019] or [severe acute respiratory syndrome coronavirus 2]).

Inclusion and exclusion criteria

The inclusion criteria were (1) studies using CO-RADS in COVID-19 diagnosis, (2) studies using RT-PCR as the reference standard, (3) original scientific articles, and (4) sufficient data for reconstructing the 2 × 2 tables for diagnostic accuracy. The exclusion criteria were (1) duplication of publications; (2) non-original studies, including editorials, letters to the editor, review articles, and case reports; (3) non-English articles; and (4) studies with insufficient information for reconstructing a 2 × 2 table.

Data extraction

We extracted data from the included studies using a standardized form: (1) true positives, false positives, true negatives, false negatives; (2) study characteristics, including first author, publication year, country, study design, cutoff values, study duration, recruitment, number of centers, number of readers, setting, and sample size; and (3) patient characteristics including age, sex ratio, and symptoms. When different readers in a single study reported more than one result, we extracted the consensus result or the average result if there was no consensus result.

Quality assessment

The methodological quality of the studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 tool [29]. Each study was rated with respect to the following domains: patient selection, index test, reference standard, and flow and timing. Each domain was assessed based on the risk of bias and the first three domains, including applicability.

Statistical analysis

We created 2 × 2 tables to calculate the pooled sensitivity, specificity, and corresponding 95% confidence interval (CI) using a bivariate random-effects model; further, we calculated the diagnostic odds ratio (DOR). The results were graphically shown in the forest plots. We constructed hierarchical summary receiver operating characteristic (HSROC) models; further, we calculated the area under the curve (AUC). We performed Deeks’ funnel plot asymmetry test to evaluate the possible presence of publication bias, with p < 0.10 indicating a possibility of publication bias [30]. The heterogeneity of the included studies was evaluated using the inconsistency index (I2) and Q statistic of the chi-square value test [31].

Heterogeneity was further explored through meta-regression by adding the following covariates to bivariate model: (1) study design (prospective vs. retrospective), (2) study enrollment (consecutive vs. unavailable, NA), (3) number of centers (< 2 vs. ≥ 2), (4) number of readers (< 3 vs. ≥ 3), (5) study setting (emergency department, ED vs. NA), and (6) publication year (2020 vs. 2021). Finally, we performed subgroup analyses according to the age, sex, and symptoms.

Statistical analyses were performed using STATA 16.1 and RevMan 5.3. Statistical significance was set at p < 0.05.

Results

Selection and data extraction

The systematic search initially yielded 224 results; among them, 104 were removed for duplication. After manually screening of titles and abstracts, 90 results were excluded, including articles not related (n = 59), case reports (n = 9), letters/editorials/abstracts (n = 6), and review articles (n = 16). After full-text reviews of the remaining 30 articles, we excluded one non-English article [32] and five articles without sufficient data for constructing 2 × 2 tables for lacking information on specificity of CO-RADS [33,34,35,36,37]. Finally, we included 24 studies with 8382 patients [12, 24, 25, 38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58] (Fig. 1).

Figure 1
figure 1

PRISMA flow chart of article selection process

Characteristics of included studies

The 24 included studies comprised 8382 patients. More than half of studies were published in 2021. The studies were conducted in various countries, including Italy, French, the Netherlands, and China. There were 3 and 17 prospective and retrospective studies, respectively, with the remaining 4 studies not providing information on study design. Fourteen studies consecutively enrolled participants, while the remaining 10 studies did not describe participant enrollment. Among the 24 studies, over half were single-center studies while six were multi-center studies and three studies did not provide relevant information. Regarding the number of readers, 12 and 4 studies included ≥ 3 and < 3 radiologists, respectively, while 8 studies lacked information. Ten studies were conducted in the ED. Five studies included both symptomatic and asymptomatic patients while 19 studies only included symptomatic patients. The sample size significantly varied from 40 to 1997, with all studies recruiting both male and female participants. Five studies did not report the age of all included patients; nonetheless, the age of patients ranged from 8 to 99 years. Moreover, all included studies used RT-PCR as the standard reference for COVID-19 (Table 1).

Table 1 characteristics of 16 included studies

Quality assessment

Figure 2 shows the quality variables of 24 included studies. Ten studies showed an unclear risk of bias since there was no information suggestive of consecutive or random sampling. Regarding the index test domain, only one study had an unclear risk of bias since it did not provide information regarding blinding to the reference standard. In the reference standard domain, two studies showed a high risk for bias since the reference standard was not blinded to the index test results. Additionally, four studies had an unclear risk of bias since they did not mention whether the reference standard was performed blinded to the index test results. Regarding the flow and timing domain, four studies had a high risk of bias since some patients were not included in the analysis while seven studies had an unclear risk of bias since they did not provide clear information regarding the time interval between the reference standard test and index test. Due to concerns regarding applicability, unclear risk ratings were assigned in two, one, and two studies in the patient selection, index test, and reference standard domains, respectively.

Fig. 2
figure 2

Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) criteria for the 24 included studies

Diagnostic accuracy

The pooled sensitivity and specificity for 21 studies using CO-RADS ≥ 2 were 0.93 (95% CI 0.88–0.96) and 0.41 (95% CI 0.30–0.53), respectively. In 20 studies using CO-RADS ≥ 3, the pooled sensitivity and specificity were 0.89 (95% CI 0.85–0.93) and 0.68 (95% CI 0.60–0.75), respectively. In 23 studies using CO-RADS ≥ 4, the pooled sensitivity and specificity were 0.83 (95% CI 0.79–0.87) and 0.84 (95% CI 0.78–0.88), respectively. In 21 studies using CO-RADS ≥ 5, the pooled sensitivity and specificity were 0.66 (95% CI 0.61–0.72) and 0.93 (95% CI 0.88–0.96), respectively (Fig. 3). Moreover, the DOR of CO-RADS using cutoff 2/3/4/5 was 9 (95% CI 5–14), 18 (95% CI 12–26), 26 (95% CI 16–40), and 25 (95% CI 15–42), respectively (Table 2).

Fig. 3
figure 3

Coupled forest plots of pooled sensitivity and specificity

Table 2 Sensitivity, specificity, LR+, LR-, and DORs of CO-RADS for the detection of COVID-19

The AUCs of CO-RADS ≥ 2/3/4/5 were 0.79 (95% CI 0.76–0.83), 0.87 (95% CI 0.84–0.90), 0.90 (95% CI 0.87–0.92), and 0.83 (95% CI 0.79–0.86), respectively. The large difference between the 95% confidence and prediction regions presented in the HSROC curve indicated substantial among-study heterogeneity (Fig. 4).

Fig. 4
figure 4

Hierarchical summary receiver operating characteristic curve (HSROC) of the diagnostic performance of CO-RADS for detecting COVID-19

There was considerable among-study heterogeneity according to Cochran’s Q test (p < 0.01) and Higgins I2 heterogeneity index (Fig. 3). Deeks’ funnel plots (Fig. 5) revealed no major publication bias in CO-RADS ≥ 2/3/4/5 (p > 0.10).

Fig. 5
figure 5

Deeks’ funnel plot used to evaluate potential publication bias

Exploration of heterogeneity

Meta-regression

We performed meta-regression to explore the heterogeneity causes among 20 and 23 studies using CO-RADS ≥ 3 and CO-RADS ≥ 4, respectively (Table 3). Study heterogeneity was independently associated with publication year, study design, study enrollment, number of centers, number of readers, and setting.

Table 3 Results of the meta-regression analysis of CO-RADS for the detection of COVID-19

Among the 20 studies using CO-RADS ≥ 3, studies published in 2020 had a higher sensitivity than those published in 2021 (0.91, 95% CI 0.85–0.97 vs. 0.89, 95% CI 0.84–0.93; p < 0.01). Prospective studies had a higher sensitivity than retrospective studies (0.92, 95% CI 0.84–0.99 vs. 0.89, 95% CI 0.84–0.93; p = 0.03). Studies with unclear information regarding patient enrollment had a significantly higher sensitivity (0.93, 95% CI 0.88–0.97) than those with consecutive enrollment (0.87, 95% CI 0.82–0.92; p < 0.01). Multi-center studies had higher sensitivity than single-center studies (0.91, 95% CI 0.85–0.97 vs. 0.89, 95% CI 0.84-0.94; p < 0.01). Studies conducted in the ED had a higher sensitivity (0.93, 95% CI 0.90–0.97) and lower specificity (0.62, 95% CI 0.50–0.73) than those without setting information (0.85, 95% CI 0.79–0.91; p = 0.02 and 0.72, 95% CI 0.63–0.81; p = 0.02, respectively).

In 23 studies using CO-RADS ≥4, the sensitivity differed among the various potential covariates. Studies published in 2021 had a higher sensitivity than those published in 2020 (0.84, 95% CI 0.80-0.88 vs. 0.82, 95% CI 0.75–0.88; p < 0.01). Studies that consecutively enrolled participants had lower sensitivity than those without related information (0.81, 95% CI 0.77–0.86 vs. 0.87, 95% CI 0.82–0.92; p < 0.01). Prospective studies had a higher sensitivity than retrospective studies (0.86, 95% CI 0.78–0.94 vs. 0.83, 95% CI 0.79–0.87; p < 0.01). Multi-center studies had a higher sensitivity than single-center studies (0.85, 95% CI 0.80–0.91 vs. 0.82, 95% CI 0.78–0.87; p <0.01). Studies using < 3 readers had higher sensitivity than those using ≥ 3 readers (0.84, 95% CI 0.75–0.93 vs. 0.81, 95% CI 0.76–0.87; p = 0.03). Studies conducted in the ED had a higher sensitivity (0.87, 95% CI 0.84–0.91) than those without setting information (0.79, 95% CI 0.74–0.84; p < 0.01). Prospective studies had a higher sensitivity than retrospective studies (0.89, 95% CI 0.80–0.99 vs. 0.82, 95% CI 0.76–0.88; p = 0.02). Studies in the ED had lower specificity than those with unclear information on setting (0.82, 95% CI 0.74–0.90 vs. 0.85, 95% CI 0.78–0.91; p < 0.01).

Subgroup analysis

Table 4 summarizes the results of subgroup analyses of the age, sex, and symptoms.

Table 4 Results of subgroup analyses of CO-RADS for the detection of COVID-19

In studies using CO-RADS ≥ 3, studies including both symptomatic and asymptomatic patients had a higher specificity than those only including symptomatic patients (0.83, 95% CI 0.72–0.93 vs. 0.64, 95% CI 0.56–0.72; p < 0.01). In studies using CO-RADS ≥ 4, studies with participants aged < 60 years had higher sensitivity (0.88, 95% CI 0.82–0.94 vs. 0.80, 95% CI 0.75–0.85; p=0.02) and lower specificity (0.77, 95% CI 0.62–0.92 vs. 0.87, 95% CI 0.82–0.92; p = 0.01) than those with participants aged > 60 years. Studies on both symptomatic and asymptomatic patients and those on only symptomatic patients had the same pooled sensitivity (0.83, 95% CI 0.74–0.92 vs. 0.83, 95% CI 0.79–0.87); however, the p value was < 0.01. Studies including both symptomatic and asymptomatic patients had higher specificity than those only including symptomatic patients (0.92, 95% CI 0.86–0.98 vs. 0.81, 95% CI 0.75–0.87; p < 0.01).

Discussion

Inconsistent reports regarding sensitivity and specificity impede our ability to determine the diagnostic accuracy of CO-RADS for detecting COVID-19. Few relevant systematic reviews or meta-analyses have assessed the diagnostic utility of CO-RADS for COVID-19 [27]. This is the first systematic review and meta-analysis to directly evaluate the accuracy of CO-RADS for COVID-19 and discuss the practical application of CO-RADS. Our findings demonstrated that CO-RADS has great utility in diagnosing COVID-19; moreover, CO-RADS ≥ 3/4 showed outstanding ability.

The AUC of CO-RADS revealed that CO-RADS ≥ 3/4 displayed favorable performance in detecting COVID-19. CO-RADS ≥ 3 might be preferred when priority is placed on the sensitivity of the examination. In case COVID-19 outbreaks re-occur in some countries, there is a need to rapidly screen potential infected patients to prevent disease spread. Contrastingly, CO-RADS ≥ 4 might be more suitable when it is more important to yield a definite diagnosis. In regions with low COVID-19 threat, there is a need to identify whether patients were infected with COVID-19 or other diseases to avoid unnecessary isolation and treatment.

Considerable among-study heterogeneity was observed. When the cutoff value was 3, there were significant differences in sensitivity according to publication year, study design, enrollment, number of centers, and setting; moreover, there were significant differences in specificity according to the setting. This suggests that the diagnostic performance of CO-RADS ≥ 3 may have been overestimated in studies with characteristics such as publication in 2020, failure to mention enrollment and setting, and multi-center settings, which yielded limitations in this meta-analysis. This suggests that CO-RADS ≥ 3 might facilitate quick detection of COVID-19 without geographical limitations in case of reoccurrence of COVID-19 outbreak. Additionally, the performance of CO-RADS ≥ 3 in readers has been relatively consistent which demonstrates the consistency and stability of CO-RADS ≥ 3. However, multi-center studies had significantly higher sensitivity than single-center studies (0.91 vs. 0.89, p < 0.01). Our findings provide additional evidence for exploring the consistency of CO-RADS ≥ 3 among different centers.

Meta-regression analysis of CO-RADS ≥ 4 revealed an association of study heterogeneity with publication year, study design, enrollment, number of centers, number of readers, and study settings, which indicated that the diagnostic performance of CO-RADS ≥ 4 may have been overestimated to some degree in studies published in 2021 without setting information, which led to several limitations in this study. Specifically, we focused on the consistency among various centers with different readers. Multi-center studies had significantly higher sensitivity than single-center studies (0.85 vs. 0.82, p < 0.01); moreover, studies with ≥ 3 readers showed significantly lower sensitivity than studies with < 3 readers (0.81 vs. 0.84; p = 0.03). Our findings provide additional evidence for exploring the consistency of CO-RADS ≥ 4 among different centers and whether the cutoff value of CO-RADS ≥ 4 should be recommended when there are < 3 readers. All the included studies recruited patients between January and June 2020, when there were rapid changes in the COVID-19 epidemic in different counties [59, 60], which may cause detection difficulties and finally lead to errors or heterogeneity. Moreover, not all studies were well-designed. Therefore, there is a need for better-designed studies on CO-RADS.

Subgroup analysis according to age revealed lower sensitivity in the older group with CO-RADS ≥ 4. Age affects the diagnostic performance for COVID-19 since older adults often have chronic lung diseases, including chronic obstructive pulmonary disease and interstitial pulmonary fibrosis [61]. Further, COVID-19 presentation in older patients is often atypical with less unspecific symptoms [62, 63], which may impede diagnosis through CO-RADS. Subgroup analysis according to sex revealed that most studies included more male participants, which is consistent with the sex-based susceptibility for COVID-19 infection [64,65,66]. Studies that included fewer male participants had higher sensitivity and specificity than those with more male participants, which may have translated to potential factor, including smoking, that affects CT scanning [66, 67], but there is a need for more well-designed studies to confirm the lack of statistical significance. Regarding the symptom subgroup, there was a significant difference in the sensitivity of CO-RADS ≥ 4; however, we could not determine the group with higher sensitivity. Further related studies are warranted. Moreover, studies enrolling both symptomatic and asymptomatic patients showed higher specificity for CO-RADS ≥ 3/4 than those only enrolling symptomatic patients, which is consistent with a previous report of higher specificity in asymptomatic patients (0.89) than in symptomatic patients (0.73) [25]. This could be attributed to patients with chronic lung diseases, who have respiratory symptoms similar to those of COVID-19, being included in the symptomatic group, which decreases the specificity [25, 68]. Moreover, this result could be attributed to subjective factors among observers and the local COVID-19 prevalence [24].

This study has several limitations. First, there was considerable heterogeneity among the included studies. Although we performed both meta-regression and subgroup analyses, there might have been insufficient exploration. Therefore, heterogeneity might be a concern. Second, some potential factors may remain unknown. Third, although we included 24 studies on CO-RADS, the available studies in each category were < 24 since some studies only provided data on one category. Finally, most of the included studies were retrospective, which led to a high risk of bias in the patient selection domain.

Our findings demonstrated that CO-RADS has favorable performance in detecting COVID-19. CO-RADS ≥ 3 and CO-RADS ≥ 4 might be suitable in different situations given their high sensitivity and specificity, respectively. However, future better-designed studies are warranted to further explore the consistency of CO-RADS among different centers and readers.