To the Editor

Main clinical manifestation of the COVID-19 disease is viral pneumonia with varying degrees of severity from mild symptoms to severe acute respiratory distress syndrome (ARDS) requiring mechanical ventilation and intensive care treatment. Computed tomography scans of patients with COVID-19 ARDS typically show diffuse bilateral interstitial pulmonary infiltrates, with asymmetric, patchy lesions distributed mainly in the periphery of the lung. In the context of the pandemic and the strain on resources and utilization, lung ultrasound (LUS) has emerged as an alternative tool for diagnosing and monitoring COVID-19 patients. Typical LUS signs are heterogeneous B-lines clusters, an irregular or fragmented pleural line, pleural effusion, lung consolidation, and partial absence of lung sliding [1]. It was suggested that according to the LUS pattern, patients can be accurately divided into four groups; from pattern A which suggests a low probability of COVID-19 disease to pattern D which indicates a high probability of disease [2]. However, use of LUS may be fraught with certain shortcomings, such as the considerable dependence on the observer’s knowledge and skills, and dependence on the performance and settings of the ultrasound transducer. This operator-dependency is especially emphasized in the real-time point-of-care (POC) LUS examination in the ICU and, therefore, the comparability of LUS findings of different observers may be arguable in certain settings [3, 4].

In this pilot study, we analyzed the concordance between different observers in evaluating particular POC LUS findings in COVID-19 critically ill patients. We sent seventy-three different LUS video-clips (electronic supplementary material: five different examples of videos) for review by 10 observers from five European countries. The videos were recorded in three European countries by physicians who had no knowledge of the study; 22 clips were documented in Slovenia, 14 in Poland, and 37 in Croatia. All video-clips were recorded in mechanically ventilated, COVID positive adult patients with severe ARDS and were standardized as follows: all were recorded in the supine patient position, all lasted 10 s, and the LUS scans used were obtained in the medioclavicular line between second and fifth rib. The cross-sectional angle, choice of probe or frequency, as well as gain and depth of the ultrasound beam were left to the operator’s choice. Observers who reviewed the video-clips were four from Croatia, two from Slovenia and Poland, and one from Serbia and Bosnia and Herzegovina. All ten observers were ICU physicians who self-identified as proficient LUS users (use on the regular basis for more than 1 year). However, the level of the education and years of experience in the application of (lung) ultrasound, as well as the type of the ultrasound machine and overall ICU environment differed significantly among the observers.

The observers were asked to view each video clip for not longer than 1 min and subsequently complete the accompanying questionnaire. The questionnaire, read by observers before the review, had 10 questions related to specific LUS findings according to Volppicelli et al. [2] with a simple yes and no answers, with the exception of the answer regarding to the number of B-lines, where observers had to enter the exact number of detected B-lines. At the end of the questionnaire, the examiner will have to provide a final grade on the probability of COVID-19, ranging from pattern A as a normal LUS finding or low probability for COVID-19, to pattern D representing a high probability of COVID 19 disease (Fig. 1). We evaluated inter-observer reliability for each question from the questionnaire using interclass correlation coefficients (ICC) according to Cicchetti (less than 0.40—poor; between 0.40 and 0.59—fair; between 0.60 and 0.74—good; between 0.75; and 1.00—excellent) [5].

Fig. 1
figure 1

The questionnaire related to specific LUS findings

The results of our study are somewhat unexpected and ICC for different LUS findings among observers (inter-rater reliability) is shown in Table 1. Comparing mutual agreement for each LUS finding between observers, we found a poor agreement for the following findings: pleural sliding, thickening of the pleural line with pleural line irregularity, observed A-lines, lobar consolidation with dynamic air bronchogram, tissue-like consolidation without bronchogram and peripheral lung consolidations. For the number of B-lines and the LUS finding of lung consolidation, we found an acceptable inter-rater agreement, while good concordance has been found in detecting the presence of B lines (B-pattern) and pleural effusion. Interrater reliability in the overall assessment of the likelihood of the presence of COVID 19 disease was also poor. Our analysis suggests that the general agreement between the different observers is unsatisfactory, especially for certain LUS findings. We found poor consistency of pleural-related findings, either pleural movement or pleural appearance. This can be explained by the fact that the recognition of pleural sliding in patients on lung protective ventilation with small tidal volume requires some experience and can be challenging, while the assessment of pleural thickening and irregularity is partly subjective. An acceptable interrater agreement on the presence of a lung consolidation but a poor agreement on the type of consolidation present can also be similarly explained. Namely, the assessment of the presence of pulmonary consolidation is less demanding and less subjective than the assessment of whether it is an obstructive atelectasis, lobar or peripheral consolidation. As expected, good inter-observer comparability was found in the assessment of the B pattern and the presence of pleural effusion, as these LUS signs are likely to be the easiest to detect. Inter-observer reproducibility in quantifying the B line was acceptable and approximately equal to the one previously published in a similarly designed study [4]. Finally, a mutual agreement on the overall estimated likelihood of COVID-19 disease according to the LUS findings was correlated poorly between the observers.

Table 1 Interclass correlation (ICC) for different LUS findings among observers (inter-rater reliability)

Obviously, the most important drawback of the study is the lack of specific training related to non-standard (COVID-specific) findings of LUS (beyond B lines and pleural effusion) for observers. The results of our study suggest that the use of lung POCUS for this purpose probably requires special training. Also, an important limitation of the study is the lack of the standardization of machine settings including gain (overall or TGC), focal point, frequency, probe selection and depth during recording various video clips which can significantly affect the results of each observer [6].

Overall, the study points towards the likely discrepancy in the concordance of LUS findings between different observers from different ICUs, and further studies are needed to define the possible implications of these results.