1 Introduction

Recently, tests have detected a type of virus in patients’ lung fluids, and this led to the discovery of a new coronavirus (CoV). Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus which belongs to the Coronaviridae family. It can cause respiratory, enteric, hepatic, and neurological diseases both in domestic animals and people [52, 57]. They also have a phylogenetic relationship with the coronaviruses that cause Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS) [54, 57]. COVID-19, SARS and MERS were zoonotic in origin and their viruses were transmitted by bats, civets, and camels, respectively.

Due to the global impact of the COVID-19 pandemic, international efforts have been made to simplify researchers’ access to viral data through repositories such as the 2019 Novel Coronavirus Resource (2019nCoVR) [53] and National Center for Biotechnology Information (NCBI) [43]. The more accessible the information, the more likely it is that a set of medical countermeasures will be rapidly developed to control the disease worldwide, as has happened with other diseases on other occasions [37, 38, 45]. In this context, computed tomography (CT) has been performed as an initial modality for screening patients, as it allows the visualization of abnormal anatomy [5, 33].

CT images show similarities between patients with COVID-19 and others type of viral pneumonia, such as SARS and MERS. Nevertheless, analysis of CT images evolves a one-by-one tedious procedure. The radiologists who analyze, sometimes, thousands of image slices per patient faces a high environmental risk of human error events occurrence. In this period of the COVID-19 pandemic, the pressure for results and the radiologists’ time limitations are elements that make analyses more error-prone.

As the use of deep learning is not new in radiology CT imaging research [3, 21, 28, 50], as soon as the pandemic was declared by the WHO in March 2019, researchers around the world started creating computer models that processed radiological images of patients with COVID-19 [3, 5, 33, 56]. Chest CT images have common features that may show a specific pattern for COVID-19, however manual analysis is time consuming for the radiologist. To speed up the analysis and reduce the probability of error, we assume that deep learning can be effective in analyzing large volumes of data generated by CT images [4, 6, 18, 28, 32, 35, 41, 56].

In this paper we describe a new model for classifying CT images based on deep learning, called WCNN. The aim is to improve the differentiation of images from patient with COVID-19. In this paper, images of patients diagnosed with COVID-19 comprises the COVID-19+ image base. The images of patients with other lung infections or inflammatory diseases such as pneumonia, cardiomegaly, pleural effusion, atelectasis, and consolidation are inserted in the COVID-19- image base.

WCNN was created using CNN approaches well established in literature [19, 48, 55]. It avoids pre-processing operations as image resizing and data augmentation. Otherwise, we proposed an additional layer to the convolutional network, which we called wave layer. The proposed layer uses the wavelet transform to decompose the image and extract its characteristics, being responsible for the pre-processing and generation of the output image that will be processed by other model’s layers.

The main contributions our research has to offer to our field of research can be divided in two categories: wave layer and overall WCNN model. Regarding the wave layer contributions, the follow ones stand out:

  • development entirely based on TensorFlow, replacing the Keras input layer;

  • automated data entry standardization;

  • capability of noise reduction filters application for medical images;

  • capacity to embody various feature extraction techniques;

  • use of crops to obtain the best region of the organ under study.

On the other hand, the most relevant contributions of the WCNN are the following ones:

  • test, in the same research, of two image databases for model development, being one public and another private;

  • use of an external database aimed to independently validate the model;

  • creation and application of objective criteria for the inclusion and exclusion of images from the image bases;

  • use of images in 16 bits, keeping the necessary information for the characterization of the disease;

  • neither data augmentation nor image resize were required to accurately discriminate the disease; and above all;

  • WCNN has great potential to reduce the clinical workload of radiologists, serving as a first or second analyst.

We organized this paper in sections, being the second one (Related Works) responsible to present related works to our research and the third one describing basic topics about wavelet transforms. Fourth, fifth, sixth, seventh and eighth sections describes the model functioning workflow, the process of image base creation, WCNN model core foundations, ablation tests and training parameters and model evaluation metrics, respectively. After this, Section 9 section presents the results by WCNN model application, Section 10 discusses the research outfits and, finally, Section 11 concludes the paper with the final considerations and future works ideas.

2 Related works

As quickly as the pandemic spread, governments, supranational organizations, research institutes, universities and corporations mobilized unprecedented amounts of human and financial resources to end the crisis. In a flash, both SARS-CoV-2 and COVID-19 were raised to the height of interest of the world scientific research community in the most diverse areas i.e., medicine, infectiology, biochemistry, information technologies, applied mathematics, artificial intelligence. Our research was conducted in this context, motivated by the intrinsic urgency of ending the pandemic; and the same happened with the initiative of other researchers, whose results are presented and discussed in this section.

The first work analyzed proposes an algorithm based on transforms and CNN for CT image recognition [6]. The authors present a solution with two branches: Trans-CNN model and Transformer module. The Trans-CNN model uses CNN’s local resource extraction capability and Transformer’s global resource extraction capability. The survey consisted of 194,922 chest CT images of 3745 patients aged from 0 to 93 years, extracted from the COVIDx-CT database. Images include patients i) healthy, ii) sick with COVID-19; and iii) patients with other lung diseases. The base was expanded by 15°, 45°, 90° and 180° rotations. The values obtained for accuracy, sensitivity and specificity were 0.9673, 0.9776 and 0.9602, respectively.

The COVID-CT-Mask-Net model also uses CNN [41]. It is performed in two steps: i) the Mask R-CNN network is trained to locate and detect regions of Ground Glass Opacity lesions on CT images; and ii) images of these lesions are merged to classify the input image. The experiment used 3000 chest CT images from the COVIDx-CT database, whose patients can be i) healthy, ii) sick with COVID-19; or, iii) patients with other pulmonary infections. The metric values of accuracy, sensitivity and specificity were calculated and resulted in 0.9673, 0.9776 and 0.9602, respectively.

Another relevant work describes a hybrid model that combines queezeNet and ShuffleNet. It uses 1252 COVID-19+ CT images and 1230 COVID-19- images from the public SARS-COV-2 Ct-Scan database, which were collected from real patients from hospitals in São Paulo, Brazil [35]. The data were expanded by performing random operations of i) rotation of ±5°; ii) change in intensity value of ±20; and iii) shear of ±20°. In addition to random operations, i) blurring; ii) inversion; and iii) resizing to 224 × 224 spatial resolution, were also performed on the images. The respective results for accuracy, sensitivity and specificity are 0.9781, 0.9615 and 0.9608.

Another example of research in which neural networks other than CNN were used was described in the article [32]. The model proposed by the authors combines the VGG-16, GoogleNet and ResNet-50 networks; aims to detect COVID-19 in the initial phase. It obtained an accuracy of 0.9827, a sensitivity of 0.9893 and a specificity of 0.9760. 150 chest CT images belonging to the Società Italiana di Radiologia Medica e Interventistica were used. They gave rise to 3000 images, grouped into Subset-1, called “COVID-19”, and Subset-2, labeled “No findings”. The resolution of the subset images is 16 × 16 and 32 × 32, respectively.

Research comparing the performance of various convolutional network architectures stands out [18]. It involves VGG16, DenseNet121, MobileNet, NASNet, Xception and EfficientNet networks. The study used chest CT images obtained from Kaggle, being 1958 from COVID-19+ patients and 1915 from COVID-19-. The image base was expanded by resizing to 224 × 224 spatial resolution. The model was trained with 70% of the images, validated with 15% and evaluated with the other 15%. Of these architectures, VGG16 presented the best results, with an accuracy of 0.9768, sensitivity of 0.9579 and specificity of 0.9971.

Another related work describes the creation of an application for detecting pneumonia caused by COVID-19 through high resolution CT analysis [3]. It was created by staff at Renmin Hospital, University of Wuhan, China. The base model of the application uses architecture derived from UNet++. Application performance was measured using 46,096 anonymous images from 106 hospital patients, grouped into two groups. The first group, with 51 COVID-19+ patients; and the second, used as a control group, with 55 COVID-19- patients. In addition, the authors retrospectively used the images of twenty-seven patients seen before the start of the project to compare the effectiveness of the diagnosis made by experts with the effectiveness obtained by the application. The result of accuracy, sensitivity and specificity of the application were, respectively, 0.9524, 1.0000 and 0.9355. Considering the twenty-seven previous image patients, the accuracy, sensitivity, and specificity achieved values of 0.9885, 0.9434, and 0.9916, respectively. So, there was demonstrated that application’s performance is compatible with medical experts obtained results.

The COVID-19-CNN model combines the use of previously trained CNNs [4]. Training and performance testing of this model used images from 405 COVID-19+ patients and 397 COVID-19- patients. 612 images were used for training, ninety-nine for validation and ninety-one for testing. The database was not expanded, but the image was scaled to a spatial resolution of 224 × 224. The COVID-19-CNN model had an accuracy of 0.9670, sensitivity of 0.9780 and specificity of 0.9556.

Feature extraction is an important stage of the process, thus, diverse robust CNN architectures are implemented, for example: DenseNet, VGGNet, InceptionV3 and ResNet [15]. Regarding deep learning based feature extraction, recent studies have been using different methods to deal with a variety of problems [14, 22, 23, 31]. In one of these studies, the goal was to segment objects from relational visual data [26]. For feature extraction, they used convolution blocks of DeepLabV3 [9] that applies atrous convolution to extract dense feature maps with the use of upsampled filters. ResNet50 [42] and VGG16 [46] were used in order to assess the influence of backbone feature extraction networks in deep models for visual tracking [25].

There are also studies about video object segmentation and resource association. Those will train a prototypical siamese network in order to find the pixel or resource which is the closest association to the first frame or segmented frame, as well as the reference frame. Then, they provide their corresponding labels [24, 27].

According to the characteristics of state-of-the-art research, the main contributions given by our work include: i) systematization of the concomitant use of public and private image banks, in the training and testing phases of the network, and creation of an external test base for independent validation of the model; ii) application of objective criteria for the inclusion and exclusion of images from these databases; iii) use of images allocated in 16 bits, containing the necessary information for the characterization of the disease; and, iv) WCNN did not use data augmentation or image resizing to classify COVID-19+ images [4, 6, 18, 35].

3 Wavelet

In this section, we present the wavelet transform theory, inspired by small waves (wavelets) of varying frequency and limited duration. Related works have shown that the use of convolutional networks is common in image classification models. The use of Discrete Wavelet Transform (DWT) is also not uncommon, however, the use we made of the DWT gave WCNN characteristics that positively impacted its performance. The DWT is a sequence of numbers that shows a certain continuous function. In addition to its efficient and intuitive structure for representing and storing multi-resolution images, the wavelet transform provides insight into the spatial and frequency characteristics of an image [7, 8, 11, 29]. Let the image be f(a, b), the discrete wavelet transform (DWT) of this image is defined in Eq. 1, as [7, 8, 11, 29]:

$$ Wf\left(a,b\right)={\int}_{-\infty}^{+\infty }f(t){\psi}_{a,b}(t) dt $$
(1)

For a discrete n-point signal, the DWT integral can take the summation form, as in Eq. 2:

$$ Wf\left(a,b\right)={\sum}_{t=0}^{N-1}f(t){\psi}_{a,b}(t) $$
(2)

The wavelet function ψa, b(t) is derived from the function ψ(t), through the transformation shown in Eq. 3:

$$ {\psi}_{a,b}(t)=\frac{1}{\sqrt{a}}\psi \left(\frac{t-b}{a}\right) $$
(3)

where a ∈ R+, b ∈ R and \( \psi \left(\frac{t-b}{a}\right) \) are the wavelet bases, where b represents the wavelet translation and a is the scale parameter associated with the width.

There is a great possibility of choice for the function ψ(t), called mother wavelet, among which there are Daubechies, Symlets, Coiflet. The scaled and shifted versions of the mother wavelet correspond to bandpass filters with different bandwidths and time durations. The wavelet transform performs a transformation step on each line, producing a matrix; where the left side contains the down-sampled low-pass coefficients (L) of each row, and the right side contains the high-pass coefficients (H) (Fig. 1a). Then, a step is applied to each column (Fig. 1b), resulting in four types of coefficients, as shown in (Fig. 1c) [7, 8, 11, 29]:

  • Coefficients that result from a convolution with high pass in both directions (HH) represent diagonal features of the image.

  • Coefficients resulting from a high-pass convolution on the columns after a lowpass convolution on the rows (HL) correspond to the horizontal characteristics of the image.

  • Coefficients originated from high-pass filters in the rows followed by lowpass filters in the columns (LH) correspond to the vertical characteristics of the image; and,

  • Coefficients from lowpass filters in both directions (LL) correspond to the approximate characteristics of the image.

Fig. 1
figure 1

a Transformation step on each line b Transformation step on each column c Diagram overview of the wavelet transform

4 Workflow and model

Classification models based on deep learning are a constituent part of workflows that aim to detect characteristic patterns in images from a database. The workflow that we used in this research has steps for creating the base, executing the model, consolidating the result, and calculating the metrics. Figure 2 illustrates the workflow of which the WCNN model execution is part.

Fig. 2
figure 2

Workflow overview and the WCNN template

The workflow was divided in two major components/processes, Materials and WCNN. The Materials notably contains the selection and distribution of the datasets and images, which are needed for WCNN. On other hand, WCNN encapsulates all process regarding the wave layer, feature extraction, flatten and fully connected layer. In the subsequent sections, all workflow process and respective steps will be described, starting with datasets creation.

5 Image base creation

This section presents the image inclusion and exclusion criteria and the datasets used in the research.

5.1 Inclusion and exclusion criteria

The first step in creating the image base was defining the image inclusion and exclusion criteria for the COVID-19+ and COVID-19- bases. The inclusion criteria are i) the tomographic reconstruction matrix formed by 512 × 512 pixels; ii) patients over 18 years of age; and iii) images of patients testing positive for COVID-19.

As mentioned earlier, images of patients diagnosed with COVID-19 make up the COVID-19+ database and images of patients who evaluated negative are entered into the COVID-19- database. This is because, although the patient is not sick with COVID-19, he has other infectious or inflammatory lung diseases such as pneumonia, cardiomegaly, pleural effusion, atelectasis, consolidation.

The exclusion criteria application was done with the help of a radiologist. He guided us to discard about 40% of the total amount of slides from each exam, being 20% of the initial slices and 20% of the final slices. This discard helps focus on the area of interest which is the lung, since the initial and final slices do not highlight it sufficiently, as illustrated in Fig. 3.

Fig. 3
figure 3

Selection of CT slices

The following subsections describe the original image repositories as well as the datasets created using the criteria.

5.2 Dataset I

Dataset I contain images from the Valencian Region Medical Image Bank (BIMCV) [49] public repository. They were generated between 02/26/2020 and 04/18/2020 and this dataset is divided into:

  • BIMCV-COVID-19+ images of patients testing positive for COVID-19, and includes CT radiographic findings and their respective reports, polymerase chain reaction (PCR) test, antibody diagnostic tests (immunoglobulin G-IgG and immunoglobulin M-IgM); and,

  • BIMCV-COVID19- images of patients testing negative for COVID-19, including CT radiographic findings and their respective reports, including pathologies such as pneumonia, cardiomegaly, pleural effusion, atelectasis, consolidation.

Between 50 and 400 CT slices were used for each exam, whose slice thickness varies from 1 mm to 7 mm. CT radiographs were performed using the following equipment: KONICA MINOLTA 0862; GMM ACCORD DR; Philips Medical Systems DigitalDiagnost; Philips Medical Systems PCR Elevate; SIEMENS SOMATOM; TOSHIBA Aquilion; Philips DigitalDiagnost; Philips Brilliance 16; Philips Medical Systems Essenta DR.

As for the distribution of patients from Dataset I, 174 patients were selected, of which 87 were BIMCV-COVID-19+ patients, constituting the COVID-19+ base, and 88 BIMCV-COVID19- patients, who composed the COVID-19- base. Of the 87 patients in the COVID-19+ database, 70% were used in the training phase, 15% in the testing phase and the remaining 15% in the validation phase. The same distribution was used in the COVID-19-base, as shown in Table 1. The patient images used in the training phase were not used again in the testing and validation phases, that is, the patients were disjointedly divided.

Table 1 Dataset I image distribution

5.3 Dataset II

Dataset II is composed by CT images from the private repository of Hospital São Lucas of the Pontifical Catholic University of Rio Grande do Sul (HSL-PUCRS), generated between 03/03/2020 and 07/30/2020. To use the images from this repository, which is private, we submitted a request for use to the PUCRS evaluation committee, under number 30791720.5.0000.5336. The process followed the normal procedures, and, in the end, the demand was approved.

Dataset II is composed of patients who evaluated positive or negative for COVID-19, the latter that has other lung diseases such as pneumonia, cardiomegaly, pleural effusion, atelectasis, consolidation. CT scans were performed using the following equipment: Siemens; GE Medical Systems, Philips Medical Systems, Toshiba. Each exam consists of 50 to 400 CT slices, and each slice has a thickness that varies from 1 mm to 5 mm.

Regarding the distribution of patients from the HSL-PUCRS, sixty patients with a positive test for COVID-19 were selected from the base of Hospital São Lucas at PUCRS. Seventy percent were used in the training phase, 15% in the testing phase and the remaining 15% were used in the validation phase, as shown in Table 2. The patient images used in the training phase were not used again in the testing and validation phases, or that is, the patients were disjointedly divided.

Table 2 Dataset II image distribution

5.4 Dataset III

Dataset III is composed of CT images obtained from the private repository of the Hospital de Clínicas of the Federal University of Uberlândia (HC-UFU), Brazil. They were generated between 04/08/2020 and 10/12/2020. Dataset III was used to validate the model with images different from those used in training and testing. Images were collected from twenty patients positive for COVID-19, in a total of 2300 images; and twenty patients negative for COVID-19 but positive for viral pneumonia, also in a total of 2300 images. The same inclusion and exclusion criteria were used in the HC-UFU database. In addition, information was anonymized. The images were obtained by the Toshiba CT Scanner equipment. The scanning parameters were defined as follows: the lung window reconstruction matrix, 512 × 512; cutting thickness, 1 mm–7 mm. Table 3 shows the tabulated data.

Table 3 Dataset III image distribution

6 WCNN model

Convolutional Neural Networks (CNNs) were proposed to assess image data. The name comes from the convolution operator, a straightforward way of doing complex operations using the convolution kernel [36]. Many variations of the CNN were already proposed such as AlexNet [19], Clarifai [55], GoogleNet [48]. WCNN is also a CNN variation and embody the CNN basic architecture as well the customized layer [20].

As illustrated in Fig. 4a, the conventional CNN architecture contains is composed by two modules: a resource extractor, which processes the raw input, and a trainable classifier, which generates the class scores (Adapted from [20]). Otherwise, Fig. 4b is a representation of the architecture of our customized CNN, which contains the same modules of the conventional CNN architecture plus the highlighted new layer we created.

Fig. 4
figure 4

a Conventional CNN architecture and b Architecture of our customized CNN highlighting the wave layer (Adapted from [20])

WCNN is composed of four stages: wave layer, feature extraction, flatten layer and fully connected layer. In the CNN, the pooling and convolution layers acts as a stage for feature extraction, whereas the classification stage is made of one or more fully connected layers followed by a sigmoid function layer [51].

Figure 5 illustrates the WCNN classification scheme and the next subsection will detail it functioning and each of its elements.

Fig. 5
figure 5

WCNN classification scheme

6.1 Wave layer

The wave layer creation required both selection of the wavelet function as well its respective level of decomposition and the wavelet transform most relevant coefficients analysis. The mother wavelet and the coefficients were chosen after analyzing the available options and considering the best one. In this subsection we will describe, in detail, this analysis as well the processing mechanism that the wave layer performs when receiving the images. To perform all analyzes we used the same dataset and WCNN parameters configuration as further described in Section 7: “Ablation Tests”.

6.1.1 Mother wavelet selection analysis

The decision to use the discrete wavelet transform of the Coiflets 5 family was made, partially, based on the work of [12, 13], in which the authors tested the wavelet transforms of the Daubechies, Symlets, Coiflets, Fejer-Korovkin and dMeyer families. Among them, the Coiflets 5 family showed the best noise reduction results in dense breast radiography images.

To ensure that Coiflets 5 would also be the most suitable family for the object of our research, we analyzed a set of discrete wavelet families, whose result reiterated in the finding of [12, 13]. To perform this analysis, we selected six discrete wavelet families implemented in Python’s Pywavelet library and considered the start and end tags of each one. The selected families were Biorthogonal, Coiflets, Daubechies, Discrete FIR approximation of Meyer, Reverse biorthogonal and Symlets. The analysis consolidated data results that are presented in Table 4.

Table 4 Accuracy of the initial and final tags of each analyzed wavelet family

As shown in Table 4, tag 5, from the Coiflets family, obtained the best result and because of it, Coiflets 5 was chosen for our model.

6.1.2 Decomposition level definition

The decomposition level was set to one to avoid loss of information that might be necessary for image classification. Other levels have been assessed but degraded the image.

6.1.3 Decomposition coefficients selection analysis

The decomposition coefficients selection analysis was performed with 1000 images, being 500 imagens both of COVID-19+ and COVID-19-. As the data are heterogeneous, independent, and non-parametric, we used the BioEstat statistical analysis software, version 5.3 to run the Friedman test using data entry, hypothesis, and significance tests, considering α = 0.05.

Our intent with this test was i) to analyze the significance between the groups [10, 58] and ii) to verify if they present statistically similar values among themselves, in relation to the approximate, horizontal, vertical, and diagonal coefficients. The standard deviation of the coefficients was used as an attribute for the significance test between the COVID-19+ and COVID-19- bases and evidenced the existence of significant statistical differences for the approximate, vertical, and diagonal coefficients.

Depending on the test result, we partially analyzed the Wavelet coefficients, with the following configurations, keeping the approximate coefficient in all of them, because it contains more information about the image: i) approximate, horizontal, and vertical coefficients; ii) approximate, vertical, and diagonal coefficients and iii) approximate, horizontal, and diagonal coefficients. The accuracies obtained in this test are shown in Table 5.

Table 5 Accuracy of the initial and final tags of each analyzed wavelet family

Considering the result of the statistical test and the partial analysis of the coefficients, the approximate, vertical, and diagonal coefficients were selected for the creation of the WCNN.

6.1.4 Wave layer processing

The Wave layer receives a CT image, with 512 × 512 spatial resolution. It goes through steps in this layer, which are described below:

The first step is responsible for reducing the impact of the background, where each image is cropped up and down by 172 pixels. The reason we did not perform lung segmentation on the selected images is to avoid removing areas of the lesion at the lung boundaries. Cropping results in a 340 × 340-pixel image, according to Fig. 14.

In the second step, the image is normalized to remove any variations caused by different CT equipment. Its characteristics are extracted from the standard normal distribution, considering the mean μ = 0 and variance σ2 = 1. From there, the mean μ and the variance σ2 are calculated, as in Eq. 4 and Eq. 5, respectively. The image I is formed by m rows and n columns, which is denoted by I0, 0, I0, 1, ⋯, Im, n [1]. INormalized is calculated according to Eq. 6.

$$ \mu =\frac{\sum_1^m{\sum}_1^n{I}_{i,j}}{m\ast n} $$
(4)
$$ {\sigma}^2=\frac{\sum_1^m{\sum}_1^n{\left({I}_{i,j}-\mu \right)}^2}{m\ast n} $$
(5)
$$ {I}_{Normalized}=\frac{I_{i,j}-\mu }{\sigma } $$
(6)

In the third step the image is processed through the wavelet transform decomposition, in a single level of decomposition, using the Coiflets 5 mother wavelet. Of the four generated coefficients (approximate, horizontal, vertical, and diagonal), in this work only three are used to render the digital image. A digital image is composed of the Red, Green and Blue (RGB) space, so the R channel receives the approximate coefficient, the G channel receives the vertical coefficient, and the B channel receives the diagonal coefficient, forming an output of decomposition that will be used by the layers, as shown in Fig. 6. The image cropped by the first step results in a region of interest of the lung that has a spatial resolution of 340 × 340. Thus, after the wavelet transform decomposition the output of decomposition results in an image with a spatial resolution of 170 × 170, shown in Fig. 6.

Fig. 6
figure 6

WCNN Wave Layer

6.2 Feature extraction

The convolution operation was established for the convolutional layer, in which a kernel is used to map the activations from one layer into the next. The convolution operation places the kernel in each position in the image (or hidden layer) so that the kernel overlaps the entire image and executes a dot product between the kernel parameters and its corresponding receptive field – to which the kernel is applied – in the image. The convolution operation is executed in all the regions of the image in order to define the next layer, in which activations keep their spatial relations in the previous layer [1, 21, 34]. There may more than one kernel in the convolutional layer. Every kernel uncovers a feature, such as an edge or a corner. During the forward pass, each kernel is slid to the width and the height of the image (or hidden layer), thus generating the feature map layer [1, 2, 21, 34].

The pooling layer is used to reduce the receptive field’s spatial size, thus reducing the number of network parameters. The pooling layer selects a reduced sample of each convolutional layer feature map. Max-pooling was the technique used for this work; it generates the maximum value in the receptive field. The receptive field is 2 × 2, therefore, max pooling will issue the maximum of the four input values [51].

6.3 Flattening layer

After the convolution and pooling processes, the next step is flattening, which converts all feature maps into a one-dimensional matrix, creating an input vector for the fully connected layer [51].

6.4 Fully connected layer

In this layer, each neuron from the previous layer is connected to each neuron from the subsequent layer, and all values contribute to predict how strongly a value correlates with a given class [51]. Fully connected layers can be layered on top of each other to capture even more sophisticated combinations of features. The output of the last fully connected layer is fed by an activation function that generates the class scores. WCNN uses the sigmoid activation function, whose output value varies in the range [0, 1]. WCNN entries with an output value above 0.5 are classified as COVID, and those with output below 0.5 relate to other lung diseases [51].

WCNN uses Adaptive Moment Estimation (ADAM), an adaptive optimization technique which saves an exponentially decaying average of previous squared gradients vt. In addition to that, ADAM also computes the average of the second moments of the gradients mt. [17, 51]. Average and non-centered variance values mt are presented in Eq. 7 and Eq. 8, respectively:

$$ {m}_t={\beta}_1\ {m}_{t-1}+\left(1-{\beta}_1\right) gt $$
(7)
$$ {v}_t={\beta}_2{v}_{t-1}+\left(1-{\beta}_2\right)g{t}^2 $$
(8)

ADAM updates exponential moving averages of the gradient and the squared gradient where the hyperparameters β1, β2 ∈ [0, 1] control the decay rates of these moving averages (Eq. 9) and (Eq. 10):

$$ {\hat{m}}_t=\frac{m_t}{1-{\beta}_1^t} $$
(9)
$$ {\hat{v}}_t=\frac{v_t}{1-{\beta}_2^t} $$
(10)

The final equation for update is (Eq. 11):

$$ {w}_{t+1}={w}_t-\frac{\alpha .{\hat{m}}_t}{\sqrt{{\hat{v}}_t}+\epsilon } $$
(11)

where α is the learning rate and ϵ is a constant added to the denominator for quick conversion methods in order to avoid the division by 0 [17, 51].

WCNN uses the Dropout technique, the most popular technique to reduce overfitting. Dropout refers to dropping out neurons in a neural network during training. Dropping out a neuron means temporarily disconnecting it, as well as all its internal and external connections, from the network. Dropped-out neurons neither contribute to the forward pass nor do they contribute to the backward pass. By using the dropout technique, the network is forced to learn the most robust features as the network architecture changes with every input [2, 51].

The output of each convolutional layer is fed by an activation function. The activation function layer consists of an activation function which uses the feature map produced by the convolutional layer and generates the activation map as the output. The activation function is used to change a neuron activation level in an output signal. Thus, it performs a mathematical operation and generates the neuron activation level at a specific interval, for instance, 0 to 1 or − 1 to 1 [51]. The functions used were the following:

  1. 1.

    Sigmoid / Logistic activation function: The sigmoid function \( \sigma (x)=\frac{1}{1+{e}^{-x}} \) is a curve shaped like an S [34].

  2. 2.

    The activation function f(x) =  max (0, x) is called Rectified Linear Unit – ReLU [34] and generates a non-linear activation map.

The WCNN detailed architecture is depicted in Table 6. Furthermore, a rectified linear unit (ReLU) activation function is used after each convolution layer (1st, 3rd, 5th, and 7th) and dense layers (9th, 10th, 11th, and 12th). To reduce the possibility of overfitting, a dropout rate of 20% was implemented to the first four fully connected layers (9th, 10th, 11th, and 12th).

Table 6 WCNN Architecture. The network contains Wave Layer (W), Convolucional Layer (C), Max-Pooling Layer (M) Fully Connected Network Layer (F)

Once the main components of the WCNN architecture have been already presented, the next section describes the series of tests to which the model was submitted.

7 Ablation tests

In artificial intelligence (AI), particularly machine learning, ablation is the removal of a component from an AI system. An ablation study investigates the performance of the AI system by changing or removing certain components to understand its contribution to the totality of the system [30]. The term ablation is an analogy with biology, as it consists of altering or removing components from an organism to determine how the individual behaves [30].

The ablation tests were performed on the WCNN network, which was developed in Python with TensorFlow library [49] running in machine I7-8750H Intel processor, 2.21GHz CPU, 16.0 GB RAM and a GeForce GTX 1060 graphic card with Max-Q Design.

The WCNN network was configured with the following parameters for these tests: i) weights were randomly initialized; ii) initial learning rate, α = 0.001, reduced by a factor of 10; iii) the number of epochs was 200; iv) batch size of 32 and v) applied in the dataset I-BIMCV.

7.1 Optimization techniques tests

The use of the gradient descent technique is a slow process of convergence, as it depends on parameters chosen at random. In the case of neural networks, this randomness falls on the initial choice of weights. Optimization methods can help an algorithm to converge faster. As afore mentioned, the SGD, RMSprops and ADAM techniques were tested [17, 51], the results of which are presented in Table 7.

Table 7 Test results using the different optimizing methods

Once the most suitable optimization method for our research been determined, we conduct the pooling test, detailed in the next subsection.

7.2 Pooling test

As the pooling layer makes a reduced sampling from the feature map of the convolutional layer, this test consisted of the use of pooling techniques, associated with WCNN, to identify which one would return the best accuracy in a specific set of images. The techniques considered were [1, 21, 34]: i) Max pooling, which samples the maximum of each feature map; ii) Min pooling samples the minimum of each feature map; and iii) Avg pooling samples the average of each feature map.

The techniques were assessed considering the configuration of the WCNN standard architecture, presented in section 7, Ablation tests. The results obtained are shown in Table 8.

Table 8 Test results with various pooling techniques

The Max pooling technique obtained an accuracy of 98% and Avg pooling, 97%. The Min pooling technique was not used as it would result in an activation map at or close to zero. This is because, when repeating Min pooling, the activation values will be zero, so the network cannot be trained once all useful information would have been lost.

8 Training configuration parameters

Having the results of the ablation tests in hand, in this section we will describe the training configuration parameters for our neural network. For this, i) the weights were randomly initialized; ii) the optimizer used was ADAM; iii) the standard parameters were set as β1 = 0.9 and β2 = 0.999 [17]; iv) the initial learning rate was defined as α = 0.001; v) reduction factor defined as 10; vi) the training consisted of 200 epochs; vii) batch size equal to 32; viii) pooling technique was max pooling with filter (2 × 2) and, ix) 20% for dropout rate. Once the configuration of training parameters has been described, the next section will address the metrics that will be used to evaluate the WCNN performance.

9 WCNN evaluation metrics

The follow set of metrics evaluates the WCNN model performance:

  1. 1.

    Accuracy (ACC): accurate classification rate as per the total number of elements.

  2. 2.

    Recall/Sensitivity (Sen): true positive rate.

  3. 3.

    Specificity (Sp): true negative rate.

  4. 4.

    F1-score: weighted average of precision and recall.

They are commonly used to assess the performance of classification algorithms [16, 40, 47]. There is a standard way to show the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) to be more visual. This method is called confusion matrix. The confusion matrix allows determining the following metrics algorithms [16, 40, 47] as per Table 9.

Table 9 Model Evaluation Metrics

It is also possible to generate the Receiver Operating Characteristic Curve (ROC curve) if it was necessary. ROC analysis is often called the ROC accuracy ratio, a common technique for judging the accuracy of default probability models [44].

Once the WCNN datasets, architecture and workflow, tests and metrics have been described in the previous sections, section 10 presents the results obtained by using our novel neural network in many contexts.

10 Results of WCNN model

In this section we present the results of use of the WCNN classification model on both internal and external datasets. The obtained results considered two approaches to evaluate the model performance being internal and external validation. Internal validation assessed Dataset I-BIMCV and Dataset II-BIMCV. On the other hand, external validation evaluated Dataset III-BIMCV.

10.1 Dataset I result

The training done with Dataset I, consisted of two hundred epochs and generated the results shown in graphs a) Training Loss and b) Training Accuracy, which make up Fig. 7.

Fig. 7
figure 7

WCNN training loss and training accuracy of Dataset I

The confusion matrix calculated by validating the internal Dataset I base, whose distribution is presented in Table 1, and shown in Fig. 8.

Fig. 8
figure 8

Dataset I confusion matrix

In Fig. 8 we can see that i) true positives (TP) = 1029; ii) true negatives (TN) = 994; iii) false positives (FP) = 3; and iv) false negatives (FN) =38. Using the parameters TP, TN, FP and FN, the metrics of accuracy, sensitivity and specificity were calculated and are presented in Table 10.

Table 10 Metrics Results for BIMCV Dataset I

Using the values from Table 10, the ROC curve was calculated with (1-Sp) =0.0031 and Sen = 0.9643, as x and y, respectively. Based on the ROC analysis, the area (AUC) was calculated to be 0.98, as shown in Fig. 9.

Fig. 9
figure 9

Dataset I ROC curve

10.2 Dataset II results

The Dataset II training, conducted in two hundred epochs, generated the results shown in graphs a) Training Loss and b) Training Accuracy, which make up Fig. 10.

Fig. 10
figure 10

WCNN training loss and training accuracy of Dataset II

The confusion matrix calculated by validating the internal Dataset II base, whose distribution is presented in Table 2, and shown in Fig. 11.

Fig. 11
figure 11

Dataset II confusion matrix

In Fig. 11 we can see that i) true positives (TP) = 733; ii) true negatives (TN) = 737; iii) false positives (FP) = 7; and iv) false negatives (FN) =3. Using the parameters TP, TN, FP and FN, the metrics of accuracy, sensitivity and specificity were calculated and are presented in Table 11.

Table 11 Metrics Results for Dataset II

According to the values presented in Table 11, the ROC curve was calculated with (1-Sp) =0.0094 and Sen = 0.9959, as x and y, respectively. Based on the ROC analysis, the area (AUC) was 0.993, as shown in Fig. 12.

Fig. 12
figure 12

Dataset II ROC curve

10.3 Dataset III results

Dataset III images were submitted to the WCNN model in two scenarios: i) WCNN was executed with Dataset I train weights; and ii) WCNN was run with Dataset II training weights, which generated the results documented in Table 12.

Table 12 Metrics Results for HC-UFU Dataset III

10.4 Consolidated results

The results were consolidated according to environmental categories: internal and external. In Fig. 13, we can see that the validation of the internal datasets presented higher average accuracy than that found in external dataset validation.

Fig. 13
figure 13

Average accuracy for internal (Dataset I e Dataset II) and external (Dataset III)

Since there is a discrepancy between the results of the internal and external datasets, as shown in the graph in Fig. 13 and in the data in Table 13, we decided to consider the average accuracy found in these datasets to compare them with the research values, in the state of the art, to obtain a realistic scenario.

Table 13 Datasets Metrics Values Consolidated Result

Table 14 shows the data used to compare our model Compare our model with that of the state of the arts.

Table 14 WCNN model versus state of art models

As per data presented in Table 14, there exists limitations inherent to previous works, which do not exist in our research, such as:

  1. 1.

    The totality of the previous works makes the distribution of the images of the datasets by the phases of training, testing, and validation by image, and not by patient. Meanwhile, “authors have to ensures that images from the same patient were not included in the different dataset partitions” [39], e.g., training and testing.

  2. 2.

    Most works use data augmentation techniques and more than 50% of them use resizing technique to force the adequacy of the images to the input size defined by the networks. The big problem in using such techniques is that they can cause the loss of image relevant information, which helps in the classification of medical images [39].

11 Discussion

The development of this work consisted in the use of the WCNN model in three bases of chest CT images. There were two internal image bases, Dataset I and Dataset II, which are, respectively, public, and private, and the external database Dataset III. As expected in these cases, the databases are heterogeneous, have a variable number of patients and images, were obtained by diverse types of CT equipment and, originally, contain patients of different profiles.

So that the model’s performance results could be safely compared, we created criteria for exclusion of the images based in [39], with the help of a radiologist. The images that we used have a size of 512 × 512 pixels (original size provided by the institutions), and the images whose patients are under 18 years of age were all discarded, i.e., only adult patients were considered in this work. This exclusion was done based in [12].

The distribution of data for training, testing and validation was done by patient and not by image, which eliminates the risk of using the same image in training and testing, for example. Thus, the training, testing and validation phases had, respectively, the percentages of 70%, 15% and 15%. We emphasize that our work does not use data augmentation and resize resources, as in literature works (see Table 14). Our approach avoids the risk of information loss by artificially increasing the image.

In addition, a wave layer was created to standardize the images. The wave normalizes the images, calculates the decomposition output by means of wavelet transform, replacing the original RGB channels by the approximate, vertical, and diagonal coefficient channels and finally, composing a new digital image that is passed on to the following layers. This process helps to reduce the difference in the extraction of images by different equipment. Associated with this benefit, the wave layer processes the image through wavelet transform decomposition, in a single decomposition level, using the Coiflets 5 mother wavelet. In view of this, we considered that the use of WCNN in the wavelet domain can promote training procedures.

This impulse occurs, firstly, because the image generated by the wavelet transform has half the spatial resolution of the original image (Fig. 14), that is, it goes from a spatial resolution of 340 × 340 to a spatial resolution of 170 × 170, that is, the spatial size of the output feature map is also reduced by half.

Fig. 14
figure 14

Region of Interest (ROI) Selection

Furthermore, the use of wavelet coefficients stimulates activation dispersion in hidden layers and in the output layer. The wavelet coefficients become sparser and therefore it is easier for the network to learn sparse maps rather than dense maps. The histograms in Fig. 15 illustrate the sparse distribution of the vertical, diagonal and approximation coefficients. The important level of sparsity further reduces the training time required for the network to locate the global minimum [8].

Fig. 15
figure 15

Histogram sample from original image and corresponding diagonal, vertical, and approximation coefficients

After that, we calculated the average of the metric values of the internal and external datasets, thus obtaining an average accuracy of 0.9819, sensitivity of 0.9783 and specificity of 0.9867. When comparing the result of our model with the state of the art’s, we found that the WCNN was among the top three.

12 Conclusion

In view of the above, we can conclude that the WCNN model has the following advantages compared to the previous related works: 1) Inclusion and exclusion criteria were adopted to form public and private databases with the help of a medical specialist, thus eliminating image duplicates, child patients, patient images with different spatial resolutions, allocation other than 16 bits; 2) Our study did not use data augmentation or image resizing, thus avoiding loss of relevant information [21]; and, 3) the WCNN model is based on a deep neural network using wavelet transform to extract features to classify images of patients with COVID-19, who already present lung changes.

Furthermore, the creation of the new Wave input layer, which replaces the Input layer from the Keras library, selects the region of interest, normalizes the region through its mean and standard deviation, and forms a new image through the decomposition of the wavelet transform, using the Coiflet family 5. Selecting the region of interest eliminates the image background; the normalization eliminates the variations caused by different equipment and the decomposition of the wavelet transform results in an image with a spatial resolution of 170 × 170, which retains essential information for the classification of the disease, in addition to accelerating the network training process. The WCNN Model is limited by the size of the input image (512 × 512 pixels), which precludes other sizes of spatial resolution of images, but instigates future assignments.

The results obtained indicate that the investment of time, human, financial and computational resources, in the creation of the WCNN, is a promising approach to assist professionals in the prognosis of the new coronavirus through chest computed tomography images.