1 Introduction

SARS-CoV-2 is the virus that is known to cause the COVID-19 disease [1]. The virus is known to be communicated through direct or indirect contact with an infected person. The primary cases of this disease were known to be reported in The People’s Republic of China. The most commonly reported symptoms of COVID-19 are fever, chills, dry cough and drowsiness. However, other symptoms like loss of appetite, breathlessness, persistent pain in the chest and loss of taste or smell may flag the presence of severe illness due to the presence of the disease. A study has shown that 1/4th of the infections remain asymptomatic throughout the course [2]. Although governing bodies around the world are taking several measures to prevent the spread of the virus among communities, one efficacious measure to curb dire consequences is isolation of the infected individuals. Thus, effective testing measures become indispensable in this scenario. Those suspected to be infected by the virus need to confirm its presence to seek immediate medical attention to avoid adverse outcomes and also to prevent further transmission of the virus in their close contacts by ensuring timely isolation. Currently, RT-PCR testing is followed up to detect the disease. This laboratory testing method makes use of nasopharyngeal swabs and the laboratory reports are usually made available in at least 24 h. However, in many regions of the world, RT-PCR is known to be reporting a huge number of false negatives [3]. This has been attributed to many underlying factors including the new variants of the virus which may have diluted the efficiency of RT-PCR tests for detection, low viral load in the nasal area, low level of viral RNA during testing, testing too early before the virus incubates, use of poor-quality reagents, inappropriate conditions during sample transportation or progression of the virus to deeper areas in the respiratory tract. Recently, there was an upsurge of cases which demonstrated acute symptoms of COVID-19 infection but produced negative RT-PCR results. Such patients were recommended medical imaging methods by the physicians to confirm the presence of the disease. Ai et al. [4] conducted a study on 1014 subjects, who underwent both RT-PCR & CT-scan, to assess the performance of CT scan in COVID-19 diagnosis by using RT-PCR as a reference benchmark. In the findings, it has been reported that in 413 patients with negative RT-PCR results, 308 had positive chest CT findings. Of the 308 patients, 48% were considered as highly likely cases. Such cases, when left untreated may prove fatal to the patients and pose a serious threat to the community. Since COVID-19 is caused by acute respiratory syndrome SARS-CoV-2, it involves the lungs at different stages of disease progression. On chest imaging, COVID-19 positive cases are known to report ground glass opacities, vascular enlargement and bilateral abnormalities [5], which provides room to further employ them. Authors sincerely acknowledge that imaging methods have associated demerits relating to cost and exposure to high radiation. Thus, chest imaging methods can be recognized not as an alternative to RT-PCR testing but as an essential complement for specific cases. To this end, authors are proposing to leverage neural network architectures for COVID-19 diagnosis to assist physicians in determining COVID-19 in chest CT & X-ray, which might facilitate them to correlate or verify their diagnosis with AI-based determination to diminish false negative rate. In this study, authors are performing comparative analysis of neural network models to achieve promising results for using artificial intelligence for rapid testing of suspected populations. Further, authors have used the COVID-19 radiography database for X-ray images [6, 7], and a dataset maintained by iCTCF for CT scan images [8]. These databases were truncated and the final dataset included 4000 CT images and 4000 X-ray images. Four different architectures applied to these data are: ResNet-50, EfficientNet, VGG-16 and a custom CNN with hyperparameter tuning.

This work achieves the state-of-the art accuracy on CT-scan and X-ray images which have been extracted from two data sources. Moreover, the four network architectures employed for our study have not yet been comparatively analyzed to study performance over the mentioned database. Thus, this research propounds the following contributions:

  • Deploying 4 CNN architectures to diagnose the presence of COVID-19

  • Comparing the performance of these models on X-ray and CT images

  • Evaluating various model metrics and compare performances

2 Literature review

Diverse research work has been carried out to study the diagnosis of COVID-19 by maneuvering AI. With the assistance of deep learning, chest imaging, including CT scans and X-ray have been tested to provide a basis for classification of positive and negative cases [9]. Zheng et al. [10] developed a weakly supervised deep learning-based software system for COVID-19 detection. They used 499 and 131 CT volumes, respectively, for the purpose of training and testing their model. Their deep learning model obtained 0.959 ROC AUC and 0.976 PR AUC. A study for finding out the correlation of RT-PCR testing and chest CT was carried out by Ai et al.[11]. They arrived at a sensitivity value of 97% in the chest CT findings to indicate COVID-19. Bernheim et al. [5] performed a retrospective study on 121 symptomatic patients that were infected by the coronavirus disease and found common CT observations of bilateral lung involvement, peripheral ground-glass and consolidated pulmonary arteries. They also reported frequent CT findings at the progressive stages of the disease. Lassau et al. [12] administered a study on 1003 patients that were suffering from the coronavirus disease and constructed a deep learning-based model on CT images. The model was used with 5 clinical variables and was reported to explain a 0.03 increase in AUC in addition to clinical factors. Yousefzadeh et al. [13] used 3 datasets to obtain 7184 scans, distinguished in 3 classes in a EfficientNetB3-based architecture to develop ai-corona and reported elevation in speed and accuracy of expert diagnosis with the assistance of their framework. Huang et al. [14] measured the lung opacification in CT scan of the chest using deep learning and noted varying quantification among groups with different clinical severity. Wang et al. [15] collected 5372 patients with CT images and created a deep learning-based system that classified COVID-19 from viral pneumonia and other pneumonia with an AUC of 0.86 and 0l87, respectively. Their system also classified patients into two distinct risk groups: high and low. In their work, Jain et al.[16] used the PA view of chest X-ray images and obtained an accuracy of 97.97% in classifying the infected patients from healthy individuals using the Xception model. Yoo et al. [17] created a Covid-19 diagnostic classifier using a deep learning model that consists of three decision trees, each performing its own function. In their research work, Basu et al. [18] introduced domain extension transfer learning ( DETL), which was used with pre-trained CNN on a dataset of X-ray images and obtained an accuracy of 90.13% ± 0.14. Sedik et al. [19] used CNN and convolutional long short-term memory model for coronavirus detection from chest imaging. Acar et al. [20] used deep learning methods to ameliorate the efficiency of models in coronavirus detection from CT images.

3 Methods

For this research work, two datasets have been used: COVID-19 Radiography Database and HUST-19 for chest X-ray images (See Fig. 1) and chest CT scan images (See Fig. 2), respectively. Originally, the two datasets contain over 15,000 images.

Fig. 1
figure 1

X-ray images

Fig. 2
figure 2

CT scan images

However, for the study, datasets have been truncated to include 4000 CT and 4000 X-ray images. The data were used in the ratio of 75:25 for training and testing purposes. Both datasets contain images belonging to two classes: COVID-19 Positive and COVID-19 Negative. The CT scan and X-ray training set comprises 1500 positive and 1500 negative images each. The CT scan and X-ray test set comprises 500 positive and 500 negative images each. The validation split for all the models was fixed as 0.1. For uniformity, the images were resized to 224X224. Post this, four varying neural network architectures which have been used are: custom CNN, EfficientNetB0, ResNet-50 & VGG-16 to classify the X-rays and CT scans of COVID-19 infected individuals from the healthy ones.

The architecture of the custom CNN used on X-ray images is represented in Fig. 3.

Fig. 3
figure 3

Custom CNN architecture

To optimize the filter size, kernel size and learning rate, random search hyperparameter tuning has been used. The tuner returned the parameters that yielded the maximum validation accuracy after 10 trials. The resultant model (referred to as custom CNN 1 in the figure) was trained for 40 epochs using Adam optimizer and had 2.2 million parameters. The same process was repeated for creating a CNN architecture (referred to as custom CNN 2) for CT scan images and the final model contained 3.6 million parameters.

Further, images were fed the to an EfficientNetB0 model, having 5.3 million parameters. The model had been pre-trained on ImageNet. EfficientNet uses the compound scaling method that scales the dimensions by an identical value, uniformly [21]. The top layers were frozen and flattened the output of the base model. Also, a dense layer having rectified linear unit (ReLU) activation function and a dropout value of 0.5 was added. Then, a fully connected layer with sigmoid activation was also included. The model was compiled using Adam optimizer with a learning rate of 1*e-3 and trained for 40 epochs.

Authors also tried the VGG-16 model [22] for the classification of chest CT and X-ray images into COVID-19 positive and negative. This model had 13 convolutional layers, 3 dense layers and 5 pooling layers. The top layers were made non-trainable and flattened the output layer of the base model to 1 dimension. A fully connected layer was also added to the model with ReLU activation and combined a dropout of 0.5. Then, a final layer with sigmoid activation was added and the model was compiled using Adam optimizer and trained for 40 epochs.

Later, authors resorted to trying ResNet-50, which is a 50-layer deep convolutional neural network, possessing 25.6 million parameters [23], on X-ray and CT images. ResNet helps improve accuracy by by-passing the layers between every two convolutional layers. A model pre-trained on the ImageNet database is used to fetch advantage of its feature rich knowledge and compiled the model using Adam optimizer for a learning rate of 1*e-3 and trained it for 40 epochs on our data. A sigmoid activation function is then used for generating classification probabilities and calculating the corresponding labels (Table 1).

Table 1 summarizes the parameters used in neural network architectures employed

4 Results

We evaluated the models through the recall, test accuracy, precision, specificity and F1 score. The confusion matrices corresponding to the models are also presented.

The custom CNN yielded the following metrics on X-ray images:

  • The confusion matrix generated for custom CNN is presented in Fig. 4.

    Out of the 1000 test images, 906 images were classified correctly, thus resulting in a testing accuracy of 90.6%.

  • Of the 500 X-ray images that were COVID-19 positive, 473 were correctly identified, whereas 27 were determined as false negatives. Thus, the recall is 94.6%.

  • Of the 500 COVID-19 negative X-ray images, 433 images were correctly classified as negative, whereas 67 were marked as false positives. Thus, the specificity of the model is 86.6%.

    The custom CNN yielded the following metrics on CT scan images:

  • The confusion matrix is presented in Fig. 5.

  • Out of the 1000 test images, 970 images were classified correctly, thus resulting in a testing accuracy of 97%.

  • Of the 500 CT scan images that were COVID-19 positive, 479 were correctly identified, whereas 21 were determined as false negatives. Thus, the recall is 95.8%.

  • Of the 500 CT scan images that were COVID-19 negative, 491 images were correctly classified as negative, whereas 9 were marked as false positives. Thus, the specificity of the model is 98.2%.

    The EfficientNetB0 model yielded the following metrics on X-ray images:

  • The confusion matrix generated for EfficientNetB0 is presented in Fig. 6.

  • Out of the 1000 test images, 967 images were classified correctly, thus resulting in a testing accuracy of 96.7%.

  • Of the 500 X-ray images that were COVID-19 positive, 491 were correctly identified, whereas 9 were determined as false negatives. Thus, the recall is 98.2%.

  • Of the 500 COVID-19 negative X-ray images, 476 images were correctly classified as negative, whereas 24 were marked as false positives. Thus, the specificity of the model is 95.2%.

    The EfficientNetB0 yielded the following metrics on CT scan images:

  • The confusion matrix is presented in Fig. 7.

  • Out of the 1000 test images, 962 images were classified correctly, thus resulting in a testing accuracy of 96.2%.

  • Of the 500 CT scan images that were COVID-19 positive, 475 were correctly identified, whereas 25 were determined as false negatives. Thus, the recall is 95.0%.

  • Of the 500 CT scan images that were COVID-19 negative, 487 images were correctly classified as negative, whereas 13 were marked as false positives. Thus, the specificity of the model is 97.4%.

    The VGG-16 yielded the following metrics on X-ray images:

  • The confusion matrix generated for VGG-16 is presented in Fig. 8.

  • Out of the 1000 test images, 958 images were classified correctly, thus resulting in a testing accuracy of 95.8%.

  • Of the 500 X-ray images that were COVID-19 positive, 467 were correctly identified, whereas 33 were determined as false negatives. Thus, the recall is 93.4%.

  • Of the 500 COVID-19 negative X-ray images, 491 images were correctly classified as negative, whereas 9 were marked as false positives. Thus, the specificity of the model is 98.2%.

    The custom VGG-16 yielded the following metrics on CT scan images:

  • The confusion matrix generated for the VGG-16 model is presented in Fig. 9.

  • Out of the 1000 test images, 938 images were classified correctly, thus resulting in a testing accuracy of 93.8%.

  • Of the 500 CT scan images that were COVID-19 positive, 473 were correctly identified, whereas 27 were determined as false negatives. Thus, the recall is 94.6%.

  • Of the 500 CT scan images that were COVID-19 negative, 465 images were correctly classified as negative, whereas 35 were marked as false positives. Thus, the specificity of the model is 93%.

    The ResNet-50 model yielded the following metrics on X-ray images:

  • The confusion matrix is presented in Fig. 10.

  • Out of the 1000 test images, 987 images were classified correctly, thus resulting in a testing accuracy of 98.7%.

  • Of the 500 X-ray images that were COVID-19 positive, 488 were correctly identified, whereas 12 were determined as false negatives. Thus, the recall is 97.6%.

  • Of the 500 COVID-19 negative X-ray images, 499 images were correctly classified as negative, whereas 1 was marked as false positive. Thus, the specificity of the model is 99.8%.

    The ResNet-50 model yielded the following metrics on CT scan images:

  • The confusion matrix is presented in Fig. 11.

  • Out of the 1000 test images, 989 images were classified correctly, thus resulting in a testing accuracy of 98.9%.

  • Of the 500 CT scan images that were COVID-19 positive, 493 were correctly identified, whereas 7 were determined as false negatives. Thus, the recall is 98.6%.

  • Of the 500 CT scan images that were COVID-19 negative, 496 images were correctly classified as negative, whereas 4 were marked as false positives. Thus, the specificity of the model is 99.2% (Table 2).

Fig. 4
figure 4

Confusion matrix

Fig. 5
figure 5

Confusion matrix

Fig. 6
figure 6

Confusion matrix

Fig. 7
figure 7

Confusion matrix

Fig. 8
figure 8

Confusion matrix

Fig. 9
figure 9

Confusion matrix

Fig. 10
figure 10

Confusion matrix

Fig. 11
figure 11

Confusion matrix

Table 2 Illustrates the various metrics including recall, precision, specificity and F1-score evaluated for the 4 models trained and tested on X-ray and CT scan images. Some of the values have been rounded to the nearest decimal point

The maximum recall for X-ray images has been provided by EfficientNetB0 followed by ResNet-50, custom CNN and VGG-16. However, ResNet-50 has outperformed the other models by yielding precision score of 99.8%, specificity of 99.8% and F1 score of 98.7% in the case of X-rays.

However, in case of CT scan images, ResNet-50 has achieved an F1 score of 98.9%, recall of 98.6%, specificity of 99.2% and precision of 99.2%.

5 Discussion

Figures 12 and 13 represent the radar plot of the evaluated metrics for neural network architectures employed on X-ray and CT-scan images, respectively. For CT-scan images, ResNet-50 has clearly surpassed the other three networks in terms of recall, precision, specificity and F1-score, whereas for X-ray images, EfficientNetB0 reports a slight increase in recall by 0.006.

Fig. 12
figure 12

Radar plot for X-ray images

Fig. 13
figure 13

Radar plot for CT-scan images

Since medical chest imaging is suggested to address the issue of false-negative reports in RT-PCR, we evaluated the false positive rate for the four network architectures to understand their scope of application. Figures 14 and 15 indicate the false positive rate trends observed in the networks for X-ray and CT scan images, respectively. It can be clearly established that ResNet-50, pre-trained on ImageNet, reports the minimum false positive rate of 0.002 on X-ray images and 0.008 on CT-scan images.

Fig. 14
figure 14

False positive rate of neural network architectures on X-ray images

Fig. 15
figure 15

False positive rate of neural network architectures on CT-scan images

Thus, ResNet-50, which has been pre-trained on ImageNet, has reportedly surpassed the other networks. Its performance could be attributed to its structural design, which stacks residual blocks with skip-connections, which allows the activation to forward from one layer to another, deeper in the network, sustaining the learning parameters unlike other neural network architectures, where activation vanishes as depth increases. Also, the feature-rich knowledge acquired by the network due to transfer learning applied via pre-training on ImageNet also contributes to the exhibited performance in determining areas of air-space consolidation opacities or ground glass opacities in the images. These aspects of the ResNet model also make it suitable for extending it to other disease diagnosis such as breast cancer identification from histopathological images, as studied by Al-Haija et al. [24]

However, all the models had certain misclassifications, i.e., false positives and false negatives; the results obtained from these models are still promising. The image quality improvement and some advanced pre-processing on them could further improve the classification accuracy to address the issue and make it appropriate to be used in clinical settings.

6 Conclusion

Deep learning has facilitated in extending functional solutions to diverse problems in the domain of healthcare. Our work is one such instance which leverages deep learning for COVID-19 diagnosis using chest imaging methods, CT-scan and X-ray. The metrics, especially accuracy and low false positive rate, encourage the prospect of applying deep learning, particularly, ResNet-50 to diagnose COVID-19. This method will assist physicians in laboratory settings for clinical verification of their analysis, thus making it an essential supplement to the traditional benchmark method of RT-PCR in specific cases. However, it may also be noted that training, testing and evaluation have been completed by combining data from two sources only. Future work may involve extraction of data from varied sources and applying more methods and architecture to them for better results. We seek to inspire the research community to contribute ideas that create a lasting impact and achieve solutions to real-world problems such as disease diagnosis. We also hope that our research work could be availed as a reference to conduct future studies that pursue deep learning-based solutions in healthcare.