Introduction

Early during the first wave of the COVID-19 pandemic, there was awareness of an increased risk of pulmonary embolism in COVID-19 patients [1, 2]. However, the recommendation from the expert societies was not to perform CT pulmonary angiogram (CTPA) as a first-line imaging modality, but rather unenhanced CT in patients presenting with dyspnea and/or desaturation [3]. Most series reporting an increased incidence of PE were based on patients with severe disease admitted to intensive care units (ICU) [4]. A meta-analysis found a pooled incidence of 24.6% for PE in intensive care unit (ICU) patients compared to 10.5% in non-ICU patients [4]. In a series based on outpatients presenting to the emergency department (ED), the observed incidence was much lower than that of ICU patients. In a series where CTPA was systematically performed in all patients presenting to ED with suspected SARS-Cov-2 infection, the prevalence of PE was lower in COVID-19 patients in comparison to non-COVID-19 patients (5.7 vs. 8.5%) [5]. Similarly, in a large multicenter study, Freund et al did not report an increased risk of PE in COVID-19 patients presenting to the emergency department [6].

However, the reported increased risk of PE and the systematic dosage of D-dimers due to their important prognostic value, has led to an increase in CTPA referrals in COVID-19 patients as shown by the data from the French National Hospital Discharge database [7].

Conventional strategies to exclude PE without performing CTPA rely on a pre-test assessment of the clinical probability using a validated clinical decision rule, combined with D-dimer testing [8]. The most widely used and recommended clinical decision rules are the Wells [9] and the revised Geneva [10] scores. For D-Dimers, several strategies are available: a fixed threshold of 500 µg/L, an age-adjusted threshold, or a pretest probability-dependent threshold. However, most COVID-19 patients present with increased D-Dimer, due to the severity of inflammation [11]. Furthermore, clinical risk factors for PE such as a history of cancer orprevious venous thromboembolism do not show the same predictive value in COVID-19 patients, resulting in inadequacy in the usual prediction rules for this population [12, 13]. Therefore, a strategy adapted to the COVID-19 context is lacking. The aim of this study was to use a large multicenter dataset to define a specific strategy to exclude PE in COVID-19 outpatients, without performing CTPA.

Methods

Study population

This study was conducted on behalf of the French Society of Thoracic Imaging. It received institutional review board approval with a waiver for patient consent (Ethical Review Committee for publications of the Cochin University Hospital (CLEP) (CLEP Decision N°: AAA-2020-08046)).

All adult outpatients who underwent a CTPA between February 1, 2020, and October 31, 2020, within 48 h of presentation at the ED of 15 university hospitals (14 in France and 1 in Belgium) were retrospectively evaluated. They were included if they had a positive RT-PCR test within 48 h of the CTPA examination and a conclusive CTPA report. Missing data such as a D-Dimer level was not an exclusion criterion.

Analyzed parameters

Patient’s charts were reviewed for demographic characteristics (age, sex), body temperature, heart rate, and pulse oximetry in the emergency department; body mass index; and other clinical characteristics to calculate the Wells score [9] and the revised Geneva score [10]. A history of previous deep vein thrombosis or PE, recent surgery or fracture of a lower limb within 1 month, and the presence of active malignancy or unilateral lower limb pain, or painful lower limb palpation with unilateral edema and hemoptysis were systematically assessed. Due to the retrospective design of the study, it was not possible to determine whether an alternative diagnosis was less likely than PE, and this criterion could not be evaluated for, when calculating the Wells score.

D-Dimer levels and laboratory data known to be associated with COVID-19 severity were collected, specifically C-reactive protein (CRP), lactate dehydrogenase (LDH), and lymphocyte and neutrophile counts. Clinical information such as symptom duration, fever during the few days prior to admission, body temperature, pulse oximetry, and oxygen flow rates was also collected.

Radiology reports were reviewed for the presence of PE as well as the extent of COVID-19 pneumonia. The extent of COVID-19 pneumonia was assessed using a 5-point scale (0%, < 25%, 25–50%, 50–75%, > 75%) [14, 15], using the structured report proposal from the French Society of Thoracic Imaging and the French Society of Radiology [16]. For all hospitalized patients presenting to the emergency department without a PE, hospital charts were reviewed for the occurrence of a secondary PE.

Patients whose CTPA was inconclusive for PE were excluded from the analysis when evaluating the performance of the different rule-out strategies.

Statistical analysis

The performance of the different rule-out strategies was expressed in terms of the area under the receiver operating characteristics (ROC) curve (AUC), sensitivity, specificity, negative predictive value, failure rate, and efficiency. The failure rate was defined as the proportion of patients with PE for whom the diagnostic strategy would have excluded PE (thereby equating to the false negative rate). Efficiency was defined as the proportion of participants in whom the diagnostic strategy would have excluded PE among all study participants.

The different rule-out strategies were as follows: (i) the Wells rule and revised Geneva score combined with D-dimer test, with either a fixed threshold of 500 µg/L, an age-adjusted threshold (age × 10 µg/L in patients aged > 50 years), or a clinical pretest probability (CPTP)-adjusted threshold (threshold of 1000 µg/L for a low CPTP or 500 µg/L for a moderate CPTP), as used in the YEARS study [17]; (ii) D-dimer only with thresholds at 500 µg/L, or age × 10 µg/L in patients aged > 50 years, and 1000 µg/L; and (iii) a prediction model for PE specifically developed from the study data.

Several strategies were used to build a prediction model for PE, all using the same set of predefined variables specifically, age, history of DVT or PE, unilateral lower-limb pain, lymphocyte and neutrophil counts, symptom duration, body temperature, and D-dimer. Models were primarily developed using Firth’s penalized logistic regression to limit the risk of sparse data bias given the limited number of events and imbalanced covariates. A backward selection procedure, with p value cut-off mimicking the use of the Akaike information criterion (AIC) was used for model selection. For sensitivity analysis, three other methods were used to improve the model performance in discriminating between participants with and without PE: (i) weighting to avoid class imbalance (patients with PE were weighted by the inverse of the probability of PE, and those without PE by the inverse of one minus the probability of PE); (ii) the use of gradient boosting instead of penalized logistic regression, a machine learning algorithm which iteratively combines several “weak” learners into a single strong unit usually performing well in settings such as this study, and (iii) combining gradient boosting and weighting. Internal validation of the model was carried out using bootstrapping [18, 19]. The differences between the performance of the bootstrap sample and the original sample were taken as a measure of the over-optimism of the selected model.

Missing baseline data was handled through multiple imputations, using the outcome in the imputation model [20, 21]. All variables considered in the scores and model development were used in the imputation model. Accordingly, 50 independent imputed data sets were generated and analyzed separately. Estimates were then pooled over the 50 imputations according to Rubin’s rules to obtain point estimates and confidence intervals (CI) for each parameter [22], except for proportions, for which we used a specific approach to compute multiple imputations Wilson confidence intervals [23]. The predictions and estimations of the model performance were estimated within the imputed datasets and then pooled (impute-last method), as recommended [24].

Analyses were performed with the R version 4.0.5 statistical software (The R Foundation for Statistical Computing). Data were presented as median with interquartile range (IQR) for non-normally distributed continuous variables and mean ± standard deviation for normally distributed continuous variables.

Results

A total of 1369 patients were included. The study population consisted of 569 women (n = 569/1369; 41.6%) and the mean age was 63.3 ± 16.4 years. CTPA was inconclusive in 19 patients (n = 19/1369; 1.4%), and PE was found in 124 patients (n = 124/1369; 9.1%) (Fig. 1, flow chart).

Fig. 1
figure 1

Flow chart

Parameters associated with the occurrence of PE

Patients with PE were slightly older (mean age 66.8 ± 16.4 years vs. 62.9 ± 16.3 years; p = 0.014) and had higher median D-dimer levels than patients without PE (3850 µg/L [IQR = 2000 to 4000] vs. 1000 µg/L [IQR:700 to 1600]; p < 0.0001). Similarly, several variables of the Wells and revised Geneva scores were significantly associated with the presence of PE, such as hemoptysis (4.1% vs. 1.2%; p = 0.030), unilateral lower limb pain (13.0% vs. 1.3%; p < 0.0001), and pain on deep venous palpation (10.2% vs. 1.4%; p < 0.0001).

Among the COVID pneumonia-related variables, median lymphocyte and neutrophil counts were higher in COVID-19 patients with PE (1.1 G/L [IQR: 0.7 to 1.7] vs. 0.9 G/L [IQR: 0.7 to 1.3]; p = 0.003 and 6.7 G/L [IQR: 5.0 to 9.4] vs. 4.6 G/L [IQR: 3.3 to 6.8]; p < 0.001, respectively). Patients with PE had symptoms for a longer period of time (8 days [IQR: 5 to 15] vs. 7 [IQR: 5 to 10]; p = 0.0006), their body temperature was lower (37.3 ± 0.9 °C vs. 37.7 ± 1.1 °C; p < 0.0001) and they were less febrile during the few days prior to admission (32.5% vs. 44.4%; p = 0.012).

Oxygen saturations, CRP, LDH, and the extent of COVID-19 pneumonia were not significantly different between the groups (p > 0.05 in Table 1).

Table 1 Participants characteristics

Performance of conventional strategies to rule out PE

In our population, the AUCs of the revised Geneva and Wells scores were 0.550 (95% confidence interval (95%CI): 0.538 to 0.563%) and 0.551 (95%CI, 0.538 to 0.563), respectively, when using a fixed D-Dimer threshold of 500 µg/L (Table 2). Both strategies had the same high sensitivity of 99.1% (95%CI, 95.2 to 99.8%) and the same low failure rate of 0.9% (95%CI, 0.2 to 4.8%). However, they had a low specificity of 11% (95%CI, 9.2 to 13.0%) for the revised Geneva score and 11% (95%CI, 9.3 to 13.0%) for the Wells score. Their efficiency was 10.0% (95%CI, 8.5 to 11.9%) and 10.1% (95%CI, 8.5 to 11.9%), respectively. This means that their use would have prevented unnecessary CTPAs in134 and 135 of the 1369 patients for the Revised Geneva and the Wells score, respectively; however, it would have resulted in one undiagnosed PE in each case.

Table 2 Performance of diagnostic strategies

The use of an age-adjusted D-dimer threshold increased the performance of both strategies (Table 2). The AUC of the revised Geneva and Wells score increased to 0.587 (95%CI, 0.572 to 0.603) and 0.588 (95%CI, 0.572 to 0.603). The sensitivity of 99.0% (95%CI, 94.7 to 99.8%) for the revised Geneva score and 99.0% (95%CI, 94.7 to 99.8%) for the Wells score and the failure rate remained similar with the age-adjusted D-dimer threshold (1.0% (95%CI, 0.2 to 5.3%) for both scores), whereas the efficiency increased to 16.8% (95%CI, 14.7 to 19.1%) for the revised Geneva score and 16.9% (95%CI, 14.8 to 19.2%) for the Wells score. Using this D-Dimer threshold would have prevented unnecessary CTPAs for 226 and 227 patients for the revised Geneva and Wells scores, respectively.

We also evaluated the CPTP-adjusted D-dimer threshold with the revised Geneva and the Wells score. This increased their efficiency to 36.0% (95%CI, 33.4 to 38.7%) and 41.8% (95%CI, 39.1 to 44.6%), for the revised Geneva and Wells scores, respectively. However, this combination was associated with a marked increase in failure rate: 7.1% (95%CI, 3.7 to 13.3%) and 8.1% (95%CI, 4.4 to 14.5%) for the revised Geneva and Wells scores, respectively. This CPTP-adjusted D-dimer threshold would have avoided 477 and 555 unnecessary CTPAs; however, it would have missed 9 and 10 PEs, using the revised Geneva or Wells score, respectively.

Despite the significant association between the risk of PE and several items of data from the revised Geneva and Wells scores, the performance of the conventional diagnostic strategies was on par with that of D-Dimer alone. Indeed, the AUC, sensitivity, failure rate, and efficiency of a D-Dimer level < 500 µg/L to rule out PE were 0.551 (95%CI, 0.538 to 0.563), 99.1% (95%CI, 95.2 to 99.8%), 0.9% (95%CI, 0.2 to 4.8%), and 10.1% (95%CI, 8.5 to 11.9%), respectively. Similarly, the AUC, sensitivity, failure rate, and efficiency of an age-adjusted D-Dimer threshold were 0.585 (95%CI, 0.570 to 0.600), 99.0% (95%CI, 94.7 to 99.8%), 1.0% (95%CI, 0.2 to 5.3%), and 16.4% (95%CI, 14.4 to 18.7%). Thus, using D-dimers alone to rule out PE would have led to 135 CTPAs not being performed resulting inone undiagnosed PE, for a fixed threshold of 500 µg/L and 220 CTPAs not being performed resulting in one undiagnosed PE for an age-adjusted threshold. Increasing the D-dimer threshold to 1000 µg/L increased the AUC to 0.687 (95%CI, 0.658 to 0.716) but decreased the sensitivity to 91.9% (95%CI, 85.5 to 95.6%) and increased the failure rate to 8.1% (95%CI, 4.4 to 14.5%), meaning that 10 PEs would have been missed whereas 557 CTPAs would have been avoided.

Performance of a COVID-19-specific strategy to rule out PE

Two machine learning methods, each with and without weighting, were used to develop a COVID-19 pneumonia-specific rule-out strategy (Supplementary Table 1 and Supplementary Fig. 1). AUCs were in the same range as those of the revised Geneva and the Wells scores, ranging from 0.513 (95%CI: 0.503 to 0.522) to 0.609 (95%CI: 0.594 to 0.623) (Table 2, Fig. 2). To obtain a strategy as safe as that of the revised Geneva and the Wells score, we set the models to have a failure rate close to 1%, meaning a sensitivity of 99%. Thus, all models had a failure rate of 0.8% and efficacy ranging from 3.1 to 20.5%. These scores would have prevented 41 out of 276 CTPAs resulting in one undiagnosed PE. Evaluating the influence of variables on the models showed that, once again, the D-dimer had a prominent role (Supplementary Fig. 2).

Fig. 2
figure 2

Receiver operating characteristics curves. The dark blue curve, the points, and numbers indicate the performance at selected thresholds in μg/L for D-dimer. For data-driven models, the ROC curves and AUCs are corrected for optimism

Discussion

In this study, we showed that conventional strategies to rule out PE can also be applied in the setting of COVID-19 and that a dedicated strategy does not perform better.

Although COVID-19 is associated with an increase in D-dimers, we observed that the D-dimer level remains the most important predictor of PE. The AUCs of strategies based on D-dimers alone were only marginally improved by the addition of other variables. This demonstrates that, although the specificity of a high D-dimer value may be compromised by the SARS-Cov-2 infection, D-dimers should remain a major biomarker in the rule-out strategy for PE, even if this results in more CTPAs being performed. Several authors have suggested tailoring the D-dimer level threshold to the specific context of COVID-19. Levels of 1000, 2000, 3000, or even higher than 6000 µg/L have been suggested to better identify patients with PE [25,26,27,28]. However, we found a D-Dimer threshold of 1000 µg/L led to an unacceptable failure rate of 8.1%, much higher than the failure rate of 2% reported by Revel et al [7]. Therefore, increasing the D-Dimer threshold in the specific setting of COVID-19 pneumonia should be avoided. The use of a CPTP-adjusted D-dimer threshold in combination with the revised Geneva and the Wells score also led to an unacceptable increase in the failure rate.

In our study, the age-adjusted D-dimer threshold happened to be the best choice. The failure rates of rule out-strategies based on D-dimers alone or D-dimers combined with pre-test clinical probability scores were close to those reported by Stals et al in their meta-analysis on the safety and efficiency of diagnostic strategies for ruling out PE in the general population, with 0.36 to 1.1% depending on the use of fixed or age-adjusted D-dimer thresholds [8]. These failure rates are lower than the maximum acceptable failure rate according to the International Society on Thrombosis and Haemostasis (ISTH) recommendation [29]. Revel et al recently reported similar failure rates of 1.3% for a fixed D-Dimer threshold of 500 µg/L and 2.2% for an age-adjusted threshold in a cohort of 781 COVID-19 from the COVID database of Parisian public hospitals [7].

While the failure rates were comparable to those observed in the general population, the efficiency was significantly lower than the 26 to 37% reported by Stals et al for the same strategies applied to the general population. This explains the increase in the number of CTPAs performed in COVID-19 patients [8].

Among the clinical variables routinely evaluated in order to estimate the risk of PE in the general population, some were associated with a significantly higher risk of PE such as hemoptysis, a unilateral lower limb pain or painful palpation, a history of deep vein thrombosis or PE, surgery or fracture of lower limb within the preceding month. Most pulmonary thrombi are related to deep vein thrombosis embolization. However, in the context of COVID-19, the smaller size and more frequent peripheral location of pulmonary emboli suggest that some pulmonary emboli might be related to in situ thrombosis [30]. However, the association between the clinical manifestations of deep vein thrombosis and the risk of PE in our study illustrates the involvement of a thromboembolic mechanism in COVID-19. This is consistent with the increased frequency of deep vein thrombosis in COVID-19 patients. In a meta-analysis, Jimenez et al reported a pooled incidence of 12.1% for deep vein thrombosis versus 7.8% for PE in hospitalized COVID-19 patients [31].

Several variables commonly assessed in COVID-19 pneumonia correlated with an increased risk of PE. Lymphocyte and neutrophil counts were higher in patients with PE. This is in agreement with Galland et al who found that a white blood cell count > 12G/L was associated with an increased risk of PE in COVID-19 patients in a multivariate analysis [27]. Thoreau et al also showed that a neutrophil count > 7 G/L was a biomarker of PE risk [32]. Similarly to Fang et al, we did not find a difference in the extent of COVID-19 pneumonia between patients with and without PE [33].

Our best-performing model combining D-dimer and other markers had an AUC of 0.532 and its use would have avoided 276 CTPAs with the same safety as that of the revised Geneva and Wells score. However, a bias of our proposed COVID-19-specific strategy is that it was developed and tested on the same dataset, and validation on an external dataset is missing. Despite the use of statistical methods to address overoptimism, our proposed COVID-19-specific model might have been favored over conventional strategies due to the fact that it was developed and tested on our dataset.

Lastly, our study only included patients with a suspicion of PE which led to a CTPA being performed. Our results do not apply to COVID-19 patients with an isolated increase in the D-Dimer and who are not clinically suspected of having PE. An elevated D-Dimer level itself should not lead to a CTPA being performed.

Our study has some limitations. Firstly, this is a retrospective study and the indication for CTPA was at the discretion of the referring physicians in each center. It is possible that some patients who did not undergo a CTPA had PE, and therefore, the true incidence of PE in this COVID-19 population cannot be calculated. Also, an unknown proportion of COVID-19 patients suspected of having a PE may not have undergone a CTPA due to a negative D-Dimer. Since these true negatives were not taken into account, the performance of the different rule-out strategies may have been underestimated. Despite this potential selection bias, the D-Dimer remained the main criterion to exclude PE and was selected in the COVID-19-specific model. In addition, only parameters routinely assessed in the setting of patients with COVID-19 pneumonia could be analyzed, and there were missing data relating to the likelihood of an alternative diagnosis of PE, which could not be assessed retrospectively and was therefore considered negative in all patients.

In conclusion, our study shows that the strategy to safely exclude PE in COVID-19 patients at the emergency department should not differ from that used in non-COVID-19 patients and could be based on D-Dimers alone, by using an age-adjusted D-dimer threshold. COVID-19-specific strategies to exclude PE as the one which was developed, are more complex and only result in a small decrease of CTPAs being performed.