Next Article in Journal
Regional Remote Sensing of Lake Water Transparency Based on Google Earth Engine: Performance of Empirical Algorithm and Machine Learning
Previous Article in Journal
In-Cylinder Heat Transfer Model Proposal Compatible with 1D Simulations in Uniflow Scavenged Engines
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks

by
Javad Hassannataj Joloudari
1,
Abdolreza Marefat
2,
Mohammad Ali Nematollahi
3,
Solomon Sunday Oyelere
4,* and
Sadiq Hussain
5
1
Department of Computer Engineering, Faculty of Engineering, University of Birjand, Birjand 9717434765, Iran
2
Department of Artificial Intelligence, Technical and Engineering Faculty, South Tehran Branch, Islamic Azad University, Tehran 1477893780, Iran
3
Department of Computer Sciences, Fasa University, Fasa 7461686131, Iran
4
Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, 93187 Skellefteå, Sweden
5
Examination Branch, Dibrugarh University, Dibrugarh 786004, Assam, India
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(6), 4006; https://doi.org/10.3390/app13064006
Submission received: 15 February 2023 / Revised: 10 March 2023 / Accepted: 18 March 2023 / Published: 21 March 2023

Abstract

:
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models from achieving satisfactory results. ID is the occurrence of a situation where the quantity of the samples belonging to one class outnumbers that of the other by a wide margin, making such models’ learning process biased towards the majority class. In recent years, to address this issue, several solutions have been put forward, which opt for either synthetically generating new data for the minority class or reducing the number of majority classes to balance the data. Hence, in this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) mixed with a variety of well-known imbalanced data solutions meaning oversampling and undersampling. Then, we propose a CNN-based model in combination with SMOTE to effectively handle imbalanced data. To evaluate our methods, we have used KEEL, breast cancer, and Z-Alizadeh Sani datasets. In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions. The classification results demonstrate that the mixed Synthetic Minority Oversampling Technique (SMOTE)-Normalization-CNN outperforms different methodologies achieving 99.08% accuracy on the 24 imbalanced datasets. Therefore, the proposed mixed model can be applied to imbalanced binary classification problems on other real datasets.

1. Introduction

Learning a classifier from an imbalanced dataset is an important topic and still a complicated problem in supervised learning algorithms. In other words, a class imbalance is a customary long-standing challenge in classification problems [1,2,3,4,5], which deals with a dataset that contains an asymmetrically larger number of samples of the majority class. The imbalanced datasets appear in vast real-world research, such as life sciences [6], facial age approximation [7], anomaly detection [8], determining counterfeit credit card transactions [9], medical imaging [10], DNA sequence identification [11], and so forth. For an imbalanced binary classification problem, samples are typically characterized by two classes, namely majority and minority.

1.1. Context of the Study

In general terms, the minority class often illustrates samples of higher importance and interest rather than the majority class. Nevertheless, compared to the minority class, the majority class usually has a more significant number of samples in a meaningful way, and sometimes, the situation may be extremely serious.
Different situations can occur in confronting the imbalanced datasets, and four common cases are depicted in Figure 1, where the blue-filled circles represent the samples of the majority class; in contrast, the red circles denote the minority class [1]. It has been shown that the type of data complexity is the principal determining factor of classification performance reduction [2].
Most of the classical classification methods, such as decision trees [2,3,4], KNN [5,6], and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) [7,8], usually train models that maximize the accuracy of proposed algorithms, sometimes ignoring the minority class [9,10,11]. Hence, several techniques have been designed and implemented to handle the imbalanced binary classification problems. Among these techniques, oversampling and undersampling are well-known [12,13,14,15]. Yet, the common undersampling and oversampling algorithms modify the initial class distribution of the dataset by excluding the majority class samples or expanding the minority class samples.
Cost-sensitive learning algorithms were among the solutions for the above-mentioned issues of imbalanced data [16,17,18]. Such algorithms designate misclassification cost errors for multiple classes, mainly lower costs for the samples of the majority class and higher for the minority class. In addition, Bagging [19] and Boosting [20] methods, which are based on ensemble learning algorithms, are among the other commonly used methods to handle imbalanced class problems [7,8,21,22].
In this paper, we use several undersampling and oversampling methods in the process of implementing our methodology, which is briefly introduced in the sequel:
  • RUS: Among undersampling methods, random undersampling (RUS) is the simplest one, in which the samples of the majority class are randomly removed until suitable balanced data are obtained [23].
  • Tomek Links: Some of the undersampling techniques focus on overlap elimination. For example, the Tomek Links [24] method, which is a modification of the Condensed Nearest Neighbor rule, is one of these methods.
  • One-Sided Selection: As a development of the Tomek Links algorithm, one can refer to the One-Sided Selection or briefly OSS method [25] that merges Tomek Links and the Condensed Nearest Neighbor algorithms.
  • Near Miss is another popular undersampling method that randomly removes the majority of class samples. When two samples classified in different classes are very close to each other, it removes the sample belonging to the larger class [5].
  • ROS: Among the oversampling algorithms, random oversampling (ROS) is the simplest one that merely selects and copies the samples from the minority class randomly, leading to more balanced data [26].
  • SMOTE: The best-known oversampling method is the Synthetic Minority Oversampling Technique (SMOTE) [14,27,28] which leverages the kNN algorithm to identify the neighbors of minority class samples and generates the new sample by selecting the kth neighbor randomly [29].

1.2. Research Gap

It is worth noting that the methods mentioned above may cause some unexpected issues. For example, undersampling techniques may ignore some valuable data, which could be vital for training a classifier. In contrast, oversampling algorithms may cause overfitting. Furthermore, for cost-sensitive learning techniques, it is not straightforward to determine the exact misclassification cost, and different misclassification costs might result in different induced outcomes. Moreover, Bagging and Boosting algorithms may exclude some valuable data while they propose sampling methods in every single iteration, and they may face an overfitting problem. Consequently, the classification results obtained by these methods are not stable.

1.3. Motivation and Contribution

In order to address these problems, this paper proposes two DL-based methods mixed with different resampling methods for better tackling the issue of an imbalanced dataset. The existing DL-based methods, especially CNN architectures, have been employed in a wide variety of challenges, and they have proven to be extremely powerful in terms of learning balanced datasets. Their efficacy has not been satisfactorily investigated when tackling imbalanced datasets [30].
CNNs are types of architectures that contain convolutional blocks and can provide an end-to-end classification algorithm. These blocks are a stack of different layers, namely convolutional layers, pooling layers, and activation functions. The most significant attributes of such models are their learning capacity with fewer parameters and translational invariance concerning the input data. In CNNs, the input data are fed to multiple convolutional blocks, which are named mainly backbone as a whole, and then followed by a sequence of fully connected layers to be classified.
The training procedure is performed using Focal Loss (FL), which optimizes the abstraction learned by the models to handle complex samples better.
In particular, the main contributions of this paper are threefold. First, 24 popular imbalanced datasets from KEEL Dataset Repository, breast cancer dataset from KDD Cup, and Z-Alizadeh Sani are chosen. The proposed pipeline is trained and validated 100 times to achieve more reliable results. Second, this paper is the first research work that extensively investigates the most efficient mix of Deep Neural Networks (DNN) with the famous resampling method, SMOTE, for imbalanced data. Lastly, the mixed SMOTE-Normalization-CNN methodology has proved to produce superior results in comparison with related research works in terms of accuracy, precision, recall, Geometric mean (G-Mean), specificity, Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Kappa.
The rest of the paper is outlined as follows: Section 2 provides a brief review of the existing works on imbalanced datasets; Section 3 presents the details of our proposed methodology; Section 4 includes the implementation setup and the evaluation process and then focuses on the results obtained by the proposed methods; Section 5 presents a discussion on comparing our method to the others, and the last section draws a reasoned conclusion and future work.

2. Related Work

To better handle imbalanced data, Li et al. [31] proposed a developed AdaBoost algorithm called Adaboost-A, which is based on the AUC evaluation metric. In fact, by considering the impact of misclassification probabilities and AUC, the algorithm mentioned above betters the computational performance of the Adaboost algorithm. Furthermore, this study proposed an ensemble learning algorithm named PSOPD-AdaBoost-A, by which the multipliers of Adaboost weak classifiers were optimized.
In [32], the authors provide a detailed exploratory comparison between the problem of handling class overlapping and class imbalances using a full range of class overlaps along with a large scale of class imbalance degrees. The rest of this study contains a thorough review of the current methods and solutions for handling the imbalanced data classification problem, characterized by two categories: distribution-based and class overlap-based algorithms.
The literature [33] proposed an undersampling technique that mainly uses the well-known Naïve Bayes classifier. Based on a random primary selection, this classifier is leveraged to select the most informational samples among the existing training dataset. At first, the model is trained on a small training set, and after that, by an iterative teaching method over the current samples, the base model is taught. The practical outcomes showed that the proposed undersampling technique is comparable to other resampling techniques.
In [34], Dablain et al. proposed a novel deep oversampling method called DeepSMOTE, which contains three principle parts and mainly uses the properties of the effective SMOTE method. Despite being simple, the method is efficient and powerful. An encoder/decoder structure, being SMOTE-based and including an improved loss function, form the three parts of this method. The results of this study show the advantages of the proposed method, especially in GAN-based oversampling cases.
The research study [35] investigates the effects of resampling on the performances of multi-class Artificial Neural Networks. Different resampling techniques (both undersampling and oversampling) were examined on several cybersecurity datasets. Furthermore, to determine the results of the proposed methods, various evaluation metrics were used. Finally, four observed patterns were reported that compare the impacts of resampling on the evaluation metrics and model training duration.
In another study [36], the authors dealt with the issue of the probable bias and tendency of a learning classifier toward the majority of samples for an imbalanced dataset. This work implemented an innovative three-dimensional framework that includes a discriminator, a generator, and a classifier, together with decision boundary regularization. The remarkable aspect of the proposed method is training a generator in association with a classifier. The reporting results show better performance of the technique than the existing methods.
To improve the efficiency and functioning of undersampling methods for imbalanced data, Xie et al. [37] proposed a new undersampling technique that leverages consecutive density peaks to gradually take out samples from the majority class of imbalanced data. In order to determine the importance of the samples of the majority class, two factors were considered, which generate a sequence of samples for learning classifiers. The study compared the implemented algorithm to six well-known undersampling methods over 40 public benchmarks, and the results verify the outperformance of the proposed technique.
In order to design learning classifiers that provide stable performances on imbalanced datasets, the study [38] proposed three different methods. These methods are mainly based on genetic algorithms which automatically specify the ratios of samples for oversampling, undersampling, and hybrid sampling techniques. The implemented algorithms were examined on 14 imbalanced datasets, and the results show that they achieved the best AUC compared to random sampling methods.
To decrease the domination of the majority class samples, the study [39] implemented a novel hybrid method called CDSMOTE, which uses class decomposition and oversampling on the minority class samples. Contrary to general undersampling algorithms, this proposed method keeps the majority class samples, leading to more balanced data. The algorithm was examined on 60 imbalanced public datasets, and the results show comparable performance compared to the existing algorithms.
By introducing a new algorithm called SMOTE-LOF, Maulidevi and Surendro [40] attempted to refine SMOTE. This method distinguishes the noise that arises when dealing with imbalanced datasets by adding the Local Outlier Factor (LOF). By examining the proposed algorithm over different imbalanced datasets, the results were compared to SMOTE. Unlike small data samples, for a large-scale dataset with a small imbalance ratio, SMOTE-LOF outperforms SMOTE.
In order to annihilate the overlap between the majority class and the minority class in an imbalanced dataset and obtain a balanced and normalized class distribution, the study [29] implemented two innovative density-based methods. These methods were density-based undersampling (DB_US) and density-based hybrid sampling (DB_HS). The first method applies merely an undersampling algorithm, while the second implements both undersampling and oversampling approaches. In addition, the balanced datasets were modeled employing Random Forest (RF) and Support Vector Machine (SVM) classifiers. As a result, the two proposed methods eliminated high-density samples from the majority class and omitted the noises of both classes. The performance of these methods was examined on 16 imbalanced datasets.
In the literature [41], the authors proposed a novel classification method called the Bagging Supervised Autoencoder Classifier (BSAC) to model credit scoring problems. This algorithm essentially leverages the superlative implementation of a supervised autoencoder based on the axioms of multi-task learning. Furthermore, BSAC tackles the issue of imbalanced datasets by engaging a variation of the Bagging procedure based on undersampling techniques. The examinations of benchmark and real-world credit scoring datasets show the robustness and efficiency of BSAC.
To improve the performance of the basic antlion optimization (ALO), in [42], a novel modified antlion optimization method (MALO) was introduced. This algorithm adds an extra variable that depends on the step size of the ants as revising the antlion position. Furthermore, MALO is modified to the issues of sample reduction to achieve better performance due to various metrics. MALO was examined on several benchmarks and balanced and imbalanced datasets. The results show the outperformance of MALO against the primary ALO method and some other comparable algorithms.
In [43], Yang et al., implemented a sampling level technique called the gravitational balanced multiple kernel learning (GBMKL) algorithms, which merges the gravity approach to produce the gravitation-balanced midpoint samples (GBMS) placed on the classification boundary. Moreover, to better the generalization efficiency, the classification boundary was modified according to the nearest neighbors of the boundary (NNB) samples. Finally, two regularization terms that correspond to GBMS and NNB were formulated to prevent overfitting. The resulting method was examined on 54 artificial and real-life imbalanced datasets, and the outcomes show the dominance of the implemented method.
Tanimoto et al. in [44] studied the near-miss positive samples in the class of imbalanced datasets. They showed that if the true positive samples are severely limited, the accuracy of the proposed model could be increased by obtaining modified label-like side information positivity to identify near-miss samples from true negatives. Furthermore, the proposed method follows learning using privileged information that leverages side information for training the desired model devoid of predicting the side information itself. The results of the experiments show the outperformance of the method in contrast to the existing algorithms.
The research study [45] proposed new development of SMOTE by merging it with the Kalman filter. After applying SMOTE to the given dataset, the implemented algorithm, called Kalman-SMOTE (KSMOTE), excludes the noisy samples in the resultant dataset that simultaneously contains the initial data and the synthetically added samples. The method was examined on a broad range of datasets, and the results show that the implemented algorithm outperforms the existing methods.
Since oversampling techniques cannot usually achieve high performance in the presence of noise, the study [46] implemented an innovative oversampling algorithm called IR-SMOTE that handles this issue. By sorting the majority class samples and the k-means clustering algorithm, the noise in minority class clusters is eliminated. After that, using the kernel density estimation method, the amount of synthetic samples is compatibly designated to each cluster. Finally, regarding random-SMOTE, the desired algorithm was improved to add new samples with ensured diversity.
The literature [30] studied the performance of convolutional neural networks (CNNs) in the presence of imbalanced data for classification problems. In order to explore this probable impact, the research used MNIST, CIFAR-10, and ImageNet as benchmarks, alongside undersampling, oversampling, two-phase training, and thresholding. The results show that imbalanced data have a detrimental effect on the performance of the proposed method. Furthermore, one should implement oversampling to the level that removes the imbalance, while the extent of the imbalance determines the ideal undersampling ratio. In addition, oversampling does not lead to the overfitting of CNNs.
Fault diagnosis of complex equipment, which plays an important role in industries, is a crucial technology, and CNN is a general tool for this purpose. In this case, faults are not common, which leads to imbalanced data, and therefore, one cannot propose CNN methods directly. In order to address this problem, a hierarchical training CNN is implemented in [47]. At first, the method uses a number-resampling technique to balance data. Then, a magnet-loss pretraining algorithm is provided to handle the overlap between diverse faults. The proposed method was examined on the public dataset CWRU with an accuracy of 94.28%.

3. Methodology

In this paper, we have used our methods applied to various datasets collected from benchmark repositories such as the KEEL (https://sci2s.ugr.es/keel/imbalanced.php, accessed on 17 March 2023), breast cancer (https://www.kdd.org/kdd-cup, accessed on 17 March 2023), and Z-Alizadeh Sani (https://archive.ics.uci.edu/ml/datasets/Z-Alizadeh+Sani, accessed on 17 March 2023) datasets in order to address the class imbalance problem. Figure 2 demonstrates an overview of our proposed methodology, whose details are included in this section.
Based on Figure 2, the main steps in our methodology include preprocessing, classification, and analysis of models.

3.1. Dataset Preprocessing

As stated before, the most acute problem in classifying imbalanced data is that classifiers become biased toward the majority class. There are several methods to overcome this issue which are generally called resampling techniques. By adding minority class samples or removing samples from the majority class, resampling turns the data into a more balanced one. In this regard, there are two principal methods: oversampling and undersampling.
Oversampling algorithms generate new samples, duplicated or synthetic, that belong to the minority class. In contrast, the undersampling techniques delete samples that belong to the majority class to afford balance to the dataset [23].
As a preprocessing step in our methodology, we have utilized various well-known oversampling and undersampling techniques for balancing the dataset. Normalization and split datasets are the next steps in data preprocessing. These are elaborated on in the following.

3.1.1. Oversampling Techniques

Random Over-Sampling (ROS)

The first and simplest method in this field is random oversampling (ROS), which aims to help the distribution of datasets by increasing the number of samples in the majority class until the class distributions tend to balance. This approach is non-heuristic, meaning that it does not boast any intelligent decision boundaries. Random oversampling is usually applied to the level that excludes the imbalance. By merely regenerating samples from the minority class, ROS tackles a balance in the training model. However, duplicating similar samples may lead to the problem of overfitting, particularly for the samples belonging to the minority classes [26,48]. Figure 3 shows an illustration of the oversampling technique.

Synthetic Minority Oversampling Technique

Synthetic Minority Oversampling Technique (SMOTE) [14,49,50,51] is another resampling technique that aims to increase the amount of minority class samples by creating synthetic samples in the minority class and is applied for balancing datasets with a highly unbalanced ratio. In order to avoid the issue of overfitting, the synthetic generation of new samples differed from the multiplication algorithm.
The main idea behind SMOTE is to generate new samples of data in the minority class by interpolation between samples of this class that are in close vicinity of each other [4,52]. Thus, SMOTE increases the number of minority class examples within an imbalanced dataset and consequently enables the classifier to achieve better generalizability. The formal procedure for SMOTE can be explained as follows: Firstly, N, which is the desired amount of oversampling, should be set to an integer number. This number can be opted for in that the dataset becomes balanced with a ratio of 1:1 within the different classes. Then, three main steps should be taken iteratively. These steps are 1: Randomly selecting a sample that belongs to the minority class, 2: The K (default 5) nearest neighbors of this sample should be selected, 3: N of these K neighbors are selected randomly for interpolation and generating new samples [53]. An intuition of how SMOTE works is shown in Figure 4.

3.1.2. Undersampling Techniques

Random Under-Sampling

The simplest technique among under-sampling methods is Random Under-Sampling (RUS) which is a data-level approach. Here, the algorithm tries to reduce the number of the majority class samples to balance data. In RUS, we randomly select samples within the majority class and delete them, which makes the distribution of a class-imbalanced dataset with a highly unbalanced ratio more balanced. RUS is a non-heuristic approach that does not behave as smart as some other algorithms. Its main drawback is the high probability of losing valuable information within a dataset [4]. More precisely, the principal issue in proposing this method is that there is no control over what information about the majority class is being thrown away. As a result, the samples that contain information and details about the decision boundary may be removed, and that valuable information is lost [23]. An overview of RUS is shown in Figure 5.

One-Sided Selection (OSS)

One-Sided Selection (OSS) [25] is proposed as an undersampling technique whose main idea is to combine TL and Condensed Nearest Neighbor Rule. In order to address the issue of imbalanced datasets, this approach leaves the minority class samples completely intact. It filters out the redundant samples in the majority class through a modification of the condensed nearest-neighbor rule [55].
In OSS, δ x . y is supposed to be a distance value that meets the requirements for being a TL, where x is chosen from the majority class, and y is selected from the minority. This way, two scenarios can happen: (1) a TL is found to be on the class boundary when both x and y exist in the right class regions (2) a TL is found to be inside one of the class regions when either x or y lies in the wrong region. OSS was introduced to decrease the number of majority class samples by omitting the data points which are borderline or noisy [56]. Figure 7 illustrates a diagram of the OSS technique.

NearMiss

The last undersampling technique proposed in this study and introduced here is NearMiss [5]. This method is based on the K-nearest neighbors algorithm and categorized as NearMiss-1, NearMiss-2, and NearMiss-3. The main idea behind NearMiss is to consider the mean distances of samples from the majority class to the samples from the minority class.
Contrary to randomly removing samples from the majority class, these methods eliminate these samples intelligibly. NearMiss-1 removes the majority class samples whose mean distances to the three nearest samples of the minority class are minimal. On the other hand, NearMiss-2 deletes the samples from the majority class with minimal average distances to the three farthest minority samples. Finally, NearMiss-3 selects a certain number of the closest samples of the majority class regarding every minority class sample [57].
As claimed in [5], the results of experiments showed that NearMiss-2 has a better performance than NearMiss-1 and NearMiss-3. Furthermore, it outperforms the RUS technique [23].
It is worth noting that NearMiss can be fine-tuned in two aspects: The variant that can be chosen from 1, 2, and 3. In addition, the number of neighbors to consider for calculating the mean distances is three as the default. An outline of the NearMiss algorithms is shown in Figure 8.

3.1.3. Normalization

Normalization is one of the most crucial preprocessing steps for any challenge in machine learning. It can be performed by scaling or transforming the original data to balance the contributions of different features in data samples. In this study, we have normalized the input data to make a distribution between zero and one.

3.1.4. Split Dataset

Further, due to the low number of samples in datasets which makes the classification result extremely unstable, we have trained and evaluated our models for 100 runs. In each run, first, we randomly shuffle and split the data into training and testing sets and train them according to the model for 2000 epochs and then evaluate it.

3.2. Models

In this section, we introduce our two proposed Artificial Neural Network (ANN)-based models, including Deep Neural Network and Convolutional Neural Network.

3.2.1. Proposed Deep Neural Network

Deep Neural Networks (DNNs) have recently become among the favorite approaches in various fields in the domain of Artificial Intelligence [58]. These networks, which are famously called models, are characterized by several layers that contain a huge number of computational units. These units, which are interconnected, meaning that the output of one unit is the input of the other, are conceived as the imitation of the physiological brain’s structure. In mathematical terms, they are a set of parametrized linear and non-linear transformations capable of being adjusted in order to output abstractions of the input data [59]. This capability comes from the amalgamation of multiple layers full of perceptrons. Although a single perceptron cannot handle data that are not linearly separable, they are the basis of Multi-Layer Perceptrons (MLPs), whose ability to transform highly non-linear data makes them a powerful and efficient tool in machine learning [60].
Furthermore, the first proposed method in this paper is a DNN-based model. The architecture of this model is demonstrated in Figure 9.
As is observed in Figure 9, our DNN model comprises different layers, including a fully connected one followed by an activation function and a batch normalization layer. Then, a fully connected layer, an activation function, and a batch normalization followed by a dropout and a single neuron fully connected layer come after.

3.2.2. Proposed Convolutional Neural Network

Convolutional operations are the main components in Convolutional Neural Network (CNN)-based models. These operations enable CNNs to extract and learn the salient features existent in the input data [61]. CNN comprises different layers that output feature maps, resulting in sliding different kernels on the input and applying activation functions [62]. Compared with DNNs, the major advantage of CNNs over DNNs is their capability to reduce the computational cost in each layer. The convoluted features extracted by these models are compact representations of the input data, which can be further used in downstream tasks such as classification [63].
In this paper, the second method proposed for the binary classification of the input data is a CNN-based model. The architecture of this model is demonstrated in Figure 10.
As is seen in Figure 10, our proposed model consists of 4 layers (two 1-dimensional convolutional layers and two fully connected layers). After each hidden layer, a non-linear activation function (ReLU) is applied to the output. To make our training process more efficient, we have experimented with several loss functions, among which Focal Loss (FL) [64] claimed better supervision of the network. In fact, it was invented to address the issue of class imbalance. FL belongs to the cost-sensitive methods which were originally introduced in the case of object detection, where the imbalance between background and salient object is often frequent. FL is a modification to the cross-entropy loss in that during the training procedure; the neural network receives more cost for wrongly predicting complex training samples.
More precisely, the cross-entropy loss function is among the most common loss functions in deep learning that originates from information theory. It is seemingly identical to the negative log-likelihood loss function, and for the binary classification problems, the binary cross-entropy loss function, denoted by l B C E is as follows:
l B C E ( y , y ) = ( y log ( y ) + ( 1 y ) log ( 1 y ) )
Here y , y 0 , 1 N , where N is the number of samples, y is the predicted value, and y denotes the ground truth table.
The problem with the cross-entropy loss function is that in the case of imbalance classification, the larger class overwhelms the loss by dominating the gradient [65]. Hence, to obtain the Focal loss function, one can simplify and rewrite Equation (2) in the following way:
CE p , y = log p ,                 if     y = 1 log 1 p ,                 if     y = 0            
Name the probability of predicting the ground truth class p t and define p t as:
CEp t = p ,                 if     y = 1     1 p ,                 if     y = 0            
Therefore, l B C E can be rewritten and simplified as:
l B C E p , y = C E p t = log p t  
Finally, FL augments a modulating factor α 1 p t γ to the binary class entropy loss function, where γ > 0 is a tunable focusing parameter which yields the following equation:
F L p t = α t 1 p t γ log p t        

4. Experimental Results

This section comprised simulation setup, dataset description, split dataset, evaluation metrics, and classification results.

4.1. Simulation Setup

This section includes the implementation details of our proposed methods. The tools used in this paper are listed in Table 1.
Moreover, in our implementation, we used the Adam algorithm to optimize the models’ parameters with a learning rate of 0.001. For the loss function, FL is used with the alpha parameter set to 0.25 and gamma parameters set to 2. Further, the list of hyperparameters of the DNN and CNN models and the description of the parameters for oversampling techniques is described in detail in Table 2, Table 3 and Table 4.

4.2. Dataset Description

In order to examine our proposed methods, we used the KEEL [66] dataset repository, breast cancer, and Z-Alizadeh Sani datasets. As is depicted in Table 5, the datasets comprise different imbalanced datasets for classification tasks.
Based on Table 5, the first column indicates the number of attributes of each dataset. The second, the sum of positive and negative samples, is calculated as all samples. Furthermore, the imbalance ratio between minority or positive and majority or negative classes is assigned in the third column. Meanwhile, the imbalance ratio is achieved by dividing negative samples into positive samples. As described in Section 3.1.3 (split dataset), the dataset was randomly shuffled and split into training and testing sets which the dataset was trained according to the model for 2000 epochs. The generated models were trained and evaluated for 100 runs.

4.3. Evaluation Metrics

This section includes the elaboration of the metrics which is used to evaluate the performance of our proposed models. A fundamental classification metric tool is the Confusion Matrix. This tool is a way of demonstrating the number of correctly and incorrectly predicted samples by a classifier. It is usually a table that contains the actual and predicted state of samples compared to each other. Figure 11 depicts such a matrix for a binary classifier. This matrix includes four items, namely True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN). The formal description of these four items is as follows:
  • True Positive (TP): The number of samples that belongs to the positive class and are correctly predicted as positive by the classifier
  • True Negative (TN): The number of samples that belongs to the negative class and are correctly predicted as negative by the classifier
  • False Positive (FP): Number of samples that belongs to the negative class, even though they are predicted as positive by the classifier
  • False Negative (FN): Number of samples that belongs to the positive class, even though they are predicted as negative by the classifier
Figure 11. Confusion matrix.
Figure 11. Confusion matrix.
Applsci 13 04006 g011
Based on Figure 11, the classes of the minority and majority are remarked as positive and negative classes, respectively. Therefore, a confusion matrix is used to obtain performance metrics for the models on the imbalanced datasets. We utilized eight metrics, such as accuracy, precision, recall, F1-score, G-Mean, specificity, AUC-ROC, and Kappa, for evaluating the DNN and CNN models [29,42,67,68,69,70,71].

4.3.1. Accuracy

Accuracy is the ratio of the number of samples that are predicted correctly to the total of the input samples, as formulated in (7).
Accuracy = TP + TN FP + FN + TP + TN

4.3.2. Specificity

The specificity is the proportion of true-negative samples to the overall number of true-negative and false-positive samples. The specificity or True Negative Rate (TNR) of a classifier is calculated using Equation (8).
Specificity = TN TN + FP

4.3.3. Recall

A recall is another measurement that shows the ratio of predicted positive samples to all the relevant samples, meaning the samples which have been actually positive. The recall is a significant metric for imbalanced datasets, demonstrating the learning accuracy of the positive class. It is calculated by Equation (9).
sensitivity = TP TP + FN

4.3.4. G-Mean

The G-mean is exploited as an accuracy metric as it can gauge the accuracy rates of majority and minority classes. It is achieved by Equation (10).
G - Mean = Sensitivity × Specificity

4.3.5. Precision

Precision shows how well a classifier’s performance is in terms of predicting positive samples. As Equation (11) shows, it is easily calculated by dividing the number of true positives by the total number of predicted samples as positive.
Precision = TP TP + FP

4.3.6. F1-Score

F1-Score, which is also called F-score or F-measure, indicates the balance which exists between recall and precision for a classifier. The closer it is to one, the more balance between precision and recall exists. F1-Score can be obtained by Equation (12).
F 1 - Score = 2 TP 2 TP + FP + FN  

4.3.7. Kappa

The kappa metric considers the random classification model accuracy to evaluate the obtained classification accuracy. It is an important metric that indicates whether the accuracy of the classifier is at the level of reliability. The values of the Kappa are between −1 to 1. On the other hand, three reliability levels of Kappa have been exploited to assess the accuracy as follows:
  • Kappa   75 : Robust consistency, high reliable accuracy.
  • 0.4   Kappa < 0.75: the accuracy’s reliance level is generally.
  • Kappa < 0.4: Accuracy is unreliable.
The kappa formula has been specified in (13).
Kappa = Accuracy random 1 random

4.3.8. AUC-ROC

The AUC-ROC is a crucial measurement to evaluate the performance of generated classification models. A ROC plot represents the trade-off between true positives and false positives, which actually indicates the correlation between specificity and recall. Furthermore, AUC specifies the amount of separability power of the classifier. The AUC range is from 0 to 1. Therefore, the higher the AUC means the model has better performance at recognizing the minority and majority classes.

4.4. Classification Results

In this section, we demonstrate our experimental results based on the evaluation metrics such as accuracy, precision, recall, F1-score, G-Mean, specificity, AUC, and Kappa. The results have been elaborated by obtaining the average for each metric on three imbalanced datasets, including the KEEL repository, breast cancer, and Z-Alizadeh Sani for classification tasks.
The results are given in Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11 for six models, such as SMOTE + NORM. + CNN/DNN, TL + Normalization (NORM.) + CNN/DNN, OSS + NORM. + CNN/DNN, NearMiss + NORM. + CNN/DNN, ROS + NORM. + CNN/DNN, and RUS + NORM. + CNN/DNN, respectively. We marked the best results in boldface.
According to the results obtained, the proposed SMOTE + NORM. + CNN model outperforms other models in terms of eight metrics on the datasets. As a result, the mixed SMOTE-NORM.-CNN model demonstrates the impact of using SMOTE in our CNN model so that the overall performance has been enhanced.
Moreover, in our experiment, the ROC plots based on the best AUC scores gained through the models are shown in Figure 12a–z on the datasets. Due to Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11 and ROC plots, the mixed SMOTE-NORM.-CNN model has the best AUC value.
Table 6. SMOTE + NORM. + CNN/DNN.
Table 6. SMOTE + NORM. + CNN/DNN.
DatasetAcc
(CNN/DNN)
Pre
(CNN/DNN)
Rec
(CNN/DNN)
F1
(CNN/DNN)
G-Mean
(CNN/DNN)
Spe
(CNN/DNN)
AUC
(CNN/DNN)
Kap
(CNN/DNN)
ecoli199.1199.3899.1399.4099.1199.3899.1199.3899.1199.3898.8399.1799.0099.2598.2198.77
ecoli299.6099.7699.6099.7799.6099.7699.6099.7699.6099.7699.4499.5499.0399.4399.1999.53
ecoli399.5399.6499.5499.6599.5399.6499.5399.6499.5399.6499.2199.3999.2099.3199.0699.29
ecoli-0_vs_199.9199.8399.9299.8499.9199.8399.9199.8399.9199.8399.9399.7999.2799.1099.8399.66
glass099.2699.1999.2799.2699.2699.1999.2699.1999.2699.1999.2799.0099.3199.5399.2798.38
glass199.3199.2399.3299.2599.3199.2399.3199.2399.3199.2399.3299.6499.2599.3499.3098.46
glass6100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
Haberman95.6095.4495.6196.3095.6095.7795.6096.0395.6096.1095.6196.4597.0398.0595.0096.01
iris0100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.9999.99100.00100.00
new-thyroid199.9599.9699.9699.9699.9599.9699.6099.9699.9699.9699.9599.9299.9099.9599.9699.92
new-thyroid299.9599.9799.9699.9799.9599.9799.9599.9799.9599.9799.9699.9499.3899.6199.9499.94
page-blocks099.4299.3999.4299.3999.4299.3999.4299.3999.4299.3999.3899.2599.2799.2899.2398.97
pima99.3599.1599.3699.1599.3599.1499.3599.1599.3599.2199.3099.3799.8199.2599.4699.03
segment099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.6799.3199.9999.99
vehicle099.9599.9799.9599.9799.9599.9799.9599.9799.9599.9799.8999.9599.8399.6099.8899.95
vehicle199.8199.7699.8199.7699.8199.7699.8199.7699.8199.7699.8799.6799.9099.1599.9099.52
vehicle299.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8499.5799.9799.98
vehicle399.8599.7499.8699.7599.8599.7499.8599.7499.8599.7499.8299.6299.8099.2099.9199.48
wisconsin99.8099.8499.8199.8599.8099.8499.8099.8499.8099.8499.8099.7599.4699.5199.6499.60
yeast198.9898.7198.9898.7298.9898.7198.9898.7198.9898.7198.9899.8198.8098.6298.6798.35
yeast3100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.9099.94
yeast-2_vs_499.9599.9199.9699.9299.9599.9199.9599.9199.9599.9199.9499.9299.8099.4599.7399.67
penbased99.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8899.8199.9899.9699.9999.99
nursery88.7187.4288.7287.4388.7187.4288.7187.4288.7187.4288.6287.3090.0089.9288.4087.30
breast cancer99.6798.8499.6898.8599.6798.8499.6798.8499.6798.8499.4698.6799.4299.1499.4898.62
Z-Alizadeh Sani98.5797.9198.5897.9298.5797.9198.5797.9198.5797.8598.4297.8499.1499.0298.2198.04
Average99.0898.9699.0999.0099.0898.9799.0998.9899.0898.9899.0398.9999.0899.0298.9298.78
NORM.: Normalization, Accuracy: Acc, Precision: Pre, Recall: Rec F1-Score: F1, Specificity: Spe, Kappa: Kap, The best results for each metric are shown in the tables. Bold font indicates the best result obtained for each metric of the models.
Table 7. TL + NORM. + CNN/DNN.
Table 7. TL + NORM. + CNN/DNN.
DatasetAcc
(CNN/DNN)
Pre
(CNN/DNN)
Rec
(CNN/DNN)
F1
(CNN/DNN)
G-Mean
(CNN/DNN)
Spe
(CNN/DNN)
AUC
(CNN/DNN)
Kap
(CNN/DNN)
ecoli199.0598.9899.0698.9999.0598.9899.0598.9899.0598.9999.0098.9899.0298.8099.1098.96
ecoli299.5199.7699.5299.7799.5199.7699.5199.7699.5199.7699.4599.5499.0399.7099.1999.53
ecoli399.2399.7499.2499.7599.2399.7499.2399.7499.2399.7499.2099.4199.0599.5299.0099.30
ecoli-0_vs_199.8199.7599.8299.7699.8199.7599.8199.7599.8199.7599.8299.6399.6099.1799.8099.61
glass099.0598.9899.0698.9999.0598.9899.0598.9899.0598.9999.0299.0099.0098.9099.0298.38
glass198.8398.3298.8498.3398.8398.3298.8398.3298.8398.3298.8398.4299.1599.3298.8098.32
glass6100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
haberman94.2095.2494.2195.2594.2095.2594.2095.2594.2095.2494.2595.2597.2397.2395.0096.35
iris0100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
new-thyroid199.7599.7899.7699.7999.7599.7899.7599.7899.7599.7899.7699.7499.8099.6099.5499.61
new-thyroid299.8999.8599.9099.8699.8999.8599.8999.8599.8999.8599.8899.8299.5199.7499.8099.74
page-blocks098.6299.3998.6399.3998.6299.3998.6299.3998.6299.3998.6099.2599.3799.2899.2198.97
pima99.2799.1099.2899.1199.2799.1099.2799.1099.2799.1099.2499.1099.1299.1099.0599.04
segment099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.6799.3199.9999.99
vehicle099.9099.9799.9199.9799.9099.9799.9099.9799.9099.9799.9099.9599.8599.6099.8799.95
vehicle199.7099.7599.7199.7699.7099.7599.7099.7599.7099.7599.7799.6599.7899.6699.6899.50
vehicle299.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8399.6399.9799.98
vehicle299.8099.7699.8199.7599.8099.7499.8199.7499.8099.7499.8099.6299.8199.2099.9199.56
wisconsin99.5099.8499.5199.8599.5099.8499.5099.8499.5099.8499.5099.7599.5199.5199.4099.60
yeast198.9898.7198.9898.7298.9898.7198.9898.7198.9898.7198.9899.8198.8098.6298.6798.35
yeast3100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.9299.93
yeast-2_vs_499.8599.8999.8699.9099.8599.8999.8599.8999.8599.8999.8499.8899.7099.6299.6499.60
penbased99.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8199.7799.9099.8599.9999.99
Nursery89.8488.2189.8588.2289.8488.2189.8488.2189.8488.2189.7088.1191.0590.9589.2088.00
breast cancer99.0598.7099.0698.7199.0598.7099.0598.7099.0598.7098.9098.5599.2699.0798.8998.57
Z-Alizadeh Sani98.3497.7998.3597.8098.3497.7998.3497.7998.3497.7998.2997.6298.8098.3498.1197.88
Average98.9298.9598.9398.9098.9298.9098.9298.9098.9298.9098.9098.8799.0798.9898.8798.79
NORM.: Normalization, Accuracy: Acc, Precision: Pre, Recall: Rec F1-Score: F1, Specificity: Spe, Kappa: Kap, The best results for each metric are shown in the tables. Bold font indicates the best result obtained for each metric of the models.
Table 8. OSS + NORM. + CNN/DNN.
Table 8. OSS + NORM. + CNN/DNN.
DatasetAcc
(CNN/DNN)
Pre
(CNN/DNN)
Rec
(CNN/DNN)
F1
(CNN/DNN)
G-Mean
(CNN/DNN)
Spe
(CNN/DNN)
AUC
(CNN/DNN)
Kap
(CNN/DNN)
ecoli198.7098.5598.7198.5698.7098.5598.7098.5598.7098.5598.6598.5099.1299.2498.7598.45
ecoli298.4198.6298.4298.6398.4198.6298.4198.6298.4198.6298.3698.6099.1399.5098.2598.54
ecoli399.2099.6399.2199.6499.2099.6399.2099.6399.2099.6399.1599.5299.2699.6099.0399.31
ecoli-0_vs_199.5999.8099.6099.8199.5999.8099.5999.8099.5999.8099.4899.7499.4099.4199.5899.21
glass099.1599.1099.1699.1199.1599.1099.1599.1099.1599.1099.1099.1099.0199.1898.9999.00
glass198.7898.4098.7998.4198.7898.4098.7898.4098.7898.4098.6298.3899.0598.9998.9898.84
glass6100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
haberman94.5095.1494.5195.1594.5095.1594.5095.1594.5095.1494.4695.3098.0097.2094.8996.30
iris0100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
new-thyroid199.7099.6399.7199.6499.7099.6399.7099.6399.7099.6399.6299.6099.7799.0599.8099.61
new-thyroid299.8099.8099.8199.8199.8099.8099.8099.8099.8099.8099.7499.7499.2099.5599.2299.74
page-blocks098.4399.2198.4499.2298.4399.2198.4399.2198.4399.2198.2199.1598.5099.4099.2098.84
pima99.5099.1199.5199.1299.5099.1199.5099.1199.5099.1199.4899.0999.3299.0599.1499.09
segment099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.3899.2099.9999.99
vehicle099.7099.8099.7199.8199.7099.8099.7099.8099.7099.8099.7099.7499.6399.6299.8799.70
vehicle199.3099.4899.3199.4999.3099.4899.3099.4899.3099.4899.2699.4099.1299.2099.4099.20
vehicle299.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.6099.4099.9599.95
vehicle299.6099.7099.6199.7199.6099.7099.6099.7099.6099.7099.5899.5699.7499.2299.5499.47
wisconsin99.5099.7599.5199.7699.5099.7599.5099.7599.5099.7599.5099.7099.5199.6099.4099.51
yeast198.8598.7098.8698.7198.8598.7198.8598.7198.8598.7198.8199.6998.4098.3198.5298.22
yeast3100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.8199.70
yeast-2_vs_499.8599.8999.8699.9099.8599.8999.8599.8999.8599.8999.8499.8899.7099.6299.6499.60
penbased99.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.7699.6999.8499.7399.9999.99
nursery89.2788.8889.2888.8989.2788.8889.2788.8889.2788.8889.1588.6590.3191.2389.0688.70
breast cancer99.1598.8199.1698.8299.1598.8199.1598.8199.1598.8199.0198.6999.1998.9999.0398.73
Z-Alizadeh Sani97.5697.4397.5797.4497.5697.4397.5697.4397.5697.4397.2797.2198.2498.2097.2097.10
Average98.7898.8298.7998.8398.7898.8298.7898.8298.7898.8298.7298.8098.9398.9498.7398.72
NORM.: Normalization, Accuracy: Acc, Precision: Pre, Recall: Rec F1-Score: F1, Specificity: Spe, Kappa: Kap, The best results for each metric are shown in the tables. Bold font indicates the best result obtained for each metric of the models.
Table 9. NearMiss + NORM. + CNN/DNN.
Table 9. NearMiss + NORM. + CNN/DNN.
DatasetAcc
(CNN/DNN)
Pre
(CNN/DNN)
Rec
(CNN/DNN)
F1
(CNN/DNN)
G-Mean
(CNN/DNN)
Spe
(CNN/DNN)
AUC
(CNN/DNN)
Kap
(CNN/DNN)
ecoli199.1699.2099.1799.2199.1699.2099.1699.2099.1699.2099.1099.1599.4199.3199.1199.11
ecoli299.6099.8899.6199.8999.6099.8899.6099.8899.6099.8899.5299.8099.2399.5599.3399.88
ecoli399.3399.7499.3499.7599.3399.7499.3399.7499.3399.7499.2999.4199.1599.5299.0299.30
ecoli-0_vs_199.6099.7899.6199.7999.6099.7899.6099.7899.6099.7899.5499.7499.3599.3799.5799.54
glass099.2099.2599.2199.2699.2099.2599.2099.2599.2099.2599.1399.1899.1199.1899.1899.06
glass199.1599.2099.1699.2199.1599.2099.1599.2099.1599.2099.1099.1799.1899.2599.0699.14
glass6100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
haberman94.8095.3694.8195.3794.8095.3794.8095.3794.8095.3794.6295.3298.0097.2595.0095.35
iris0100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
new-thyroid199.6399.6599.6499.6699.6399.6599.6399.6599.6399.6499.5999.6199.7499.8099.5099.55
new-thyroid299.9099.8799.9199.8899.9099.8799.9099.8799.9099.8799.8499.8799.5099.7099.8799.78
page-blocks099.2099.4399.2199.4499.2099.4399.2099.4399.2099.4399.1599.3999.3599.2499.1199.24
pima99.3099.1599.3199.1699.3099.1599.3099.1599.3099.1599.2599.1199.2599.1199.1599.08
segment099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.7099.5199.9999.99
vehicle099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.7899.6299.9999.99
vehicle199.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8499.8099.9999.99
vehicle299.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.7799.6499.9999.99
vehicle399.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.4099.5099.9999.99
wisconsin99.4299.5099.4399.5199.4299.5099.4299.5099.4299.5099.3599.4299.2099.5599.3299.41
yeast199.3099.2599.3199.2699.3099.2599.3099.2599.3099.2599.2499.1599.5898.3799.2498.35
yeast3100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.8199.70
yeast-2_vs_499.9099.9199.9199.9299.9099.9199.9099.9199.9099.9199.8799.9299.7499.8599.6299.67
penbased99.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8199.7699.9299.8899.9999.99
nursery88.3788.1888.3888.1988.3788.1888.3788.1888.3788.1888.2188.0290.3789.8688.0288.00
breast cancer99.2398.9299.2498.9399.2398.9299.2398.9299.2398.9299.1198.7599.0599.0299.0999.02
Z-Alizadeh Sani98.5098.2498.5198.2598.5098.2498.5098.2498.5098.1098.3998.0598.2798.1098.2198.13
Average98.9899.0198.9899.0298.9899.0198.9899.0198.9899.0198.9298.9599.0798.9998.8998.89
NORM.: Normalization, Accuracy: Acc, Precision: Pre, Recall: Rec F1-Score: F1, Specificity: Spe, Kappa: Kap, The best results for each metric are shown in the tables. Bold font indicates the best result obtained for each metric of the models.
Table 10. ROS + NORM. + CNN/DNN.
Table 10. ROS + NORM. + CNN/DNN.
DatasetAcc
(CNN/DNN)
Pre
(CNN/DNN)
Rec
(CNN/DNN)
F1
(CNN/DNN)
G-Mean
(CNN/DNN)
Spe
(CNN/DNN)
AUC
(CNN/DNN)
Kap
(CNN/DNN)
ecoli198.5298.4998.5398.5098.5298.4998.5298.4998.5298.4998.3498.2899.0898.5798.8798.64
ecoli298.8198.7098.8298.7198.8198.7098.8198.7098.8198.7098.7298.6199.3899.3398.7698.50
ecoli398.9098.6498.9198.6598.9098.6498.9098.6498.9098.6498.7898.5699.1699.2299.1299.21
ecoli-0_vs_198.8198.6498.8298.6598.8198.6498.8198.6498.8198.6498.7098.5399.0699.2798.5498.40
glass098.3098.1798.3198.1898.3098.1798.3098.1798.3098.1798.1998.0899.2599.1598.1199.16
glass198.2798.1698.2898.1798.2798.1698.2798.1698.2798.1698.2098.1099.0899.1998.1598.02
glass6100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
Haberman95.4595.5095.4695.5195.4595.5095.4595.5095.4595.5095.3095.4098.0098.2695.2495.40
iris0100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
new-thyroid198.5998.6398.6098.6498.5998.6398.5998.6398.5998.6398.4698.6099.0199.1498.6899.85
new-thyroid298.8798.7998.8898.8098.8798.7998.8798.7998.8798.7998.5498.5099.0899.0198.8098.72
page-blocks098.4098.1698.4198.1798.4098.1698.4098.1698.4098.1698.2698.0999.1299.0698.6098.47
pima98.3299.0298.3399.0398.3299.0298.3299.0298.3299.0298.2498.9599.2099.0399.2399.01
segment099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.3599.2999.9999.99
vehicle098.9898.9098.9998.9198.9898.9098.9898.9098.9898.9098.8698.8098.6098.5798.8698.80
vehicle198.5898.4498.5998.4598.5898.4498.5898.4498.5898.4498.5098.3899.1699.1098.4298.37
vehicle299.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.5599.3499.9599.95
vehicle299.3199.1599.3299.1699.3199.1599.3199.1599.3199.1599.2399.1099.2199.1199.2099.05
wisconsin99.7299.7099.7399.7199.7299.7099.7299.7099.7299.7099.6099.5599.2399.1999.5099.45
yeast198.9498.6098.9598.6198.9498.6098.9498.6098.9498.6098.8098.7399.0399.0098.9098.72
yeast3100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.9199.96
yeast-2_vs_498.9399.0098.9499.0098.9399.0098.9399.0098.9399.0098.8198.9098.8699.1098.9998.92
penbased99.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.7399.6099.8999.8799.9999.99
nursery88.1988.0388.2088.0488.1988.0388.1988.0388.1988.0388.0287.9390.1989.9587.9487.83
breast cancer99.0098.7299.0198.7399.0098.7299.0098.7299.0098.7298.8198.6198.7398.5298.6798.43
Z-Alizadeh Sani96.8796.6096.8896.6196.8796.6096.8796.6096.8796.6096.7296.5097.1997.0096.7296.49
Average98.4598.3898.4598.3998.4598.3898.4598.3898.4598.3898.3398.2998.7898.7498.4298.43
NORM.: Normalization, Accuracy: Acc, Precision: Pre, Recall: Rec F1-Score: F1, Specificity: Spe, Kappa: Kap, The best results for each metric are shown in the tables. Bold font indicates the best result obtained for each metric of the models.
Table 11. RUS + NORM. + CNN/DNN.
Table 11. RUS + NORM. + CNN/DNN.
DatasetAcc
(CNN/DNN)
Pre
(CNN/DNN)
Rec
(CNN/DNN)
F1
(CNN/DNN)
G-Mean
(CNN/DNN)
Spe
(CNN/DNN)
AUC
(CNN/DNN)
Kap
(CNN/DNN)
ecoli198.2698.1798.2798.1898.2698.1798.2698.1798.2698.1798.1998.0799.0299.0098.6498.60
ecoli298.5198.4398.5298.4498.5198.4398.5198.4398.5198.4398.3998.3799.1399.0698.7198.60
ecoli398.8498.7098.8598.7198.8498.7098.8498.7098.8498.7098.7698.6399.1999.1698.8098.62
ecoli-0_vs_198.7798.6098.7898.6198.7798.6098.7798.6098.7798.6098.7098.5199.0399.0198.6598.57
glass098.3698.1398.3798.1498.3698.1398.3698.1398.3698.1398.2498.0399.2899.0698.1798.11
glass198.4998.5398.5098.5498.4998.5398.4998.5398.4998.5398.3398.4499.1698.3998.2798.45
glass6100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00
haberman94.7394.6194.7494.6294.7394.6194.7394.6194.7394.6194.6594.5299.0198.8994.2994.20
iris099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.8499.7099.9699.95
new-thyroid198.2998.4398.3098.4498.3098.4398.3098.4398.3098.4398.0598.2799.3299.4399.2798.43
new-thyroid298.8098.3098.8198.3198.8098.3098.8098.3098.8098.3098.6998.1598.9998.5299.0198.78
page-blocks098.3198.1498.3298.1598.3198.1498.3198.1498.3198.1498.1598.0498.8798.4598.3298.14
pima98.1998.0598.2098.0698.1998.0598.1998.0598.1998.0598.0697.9498.8598.5698.0097.94
segment099.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.2999.2199.9999.99
vehicle098.8498.7398.8598.7498.8498.7398.8498.7398.8498.7398.6398.7399.1598.9998.7098.52
vehicle198.4698.3298.4798.3398.4698.3298.4698.3298.4698.3298.3298.2498.2198.1098.2498.10
vehicle299.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.3399.0599.8299.70
vehicle399.2899.1399.2999.1499.2899.1399.2899.1399.2899.1399.1899.0599.0898.9999.1198.94
wisconsin99.6499.4899.6599.4999.6499.4899.6499.4899.6499.4899.6199.3499.6499.2799.2999.16
yeast198.9998.7099.0098.7198.9998.7098.9998.7098.9998.7098.7998.6199.0298.8998.7598.52
yeast3100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.00100.0099.8499.62
yeast-2_vs_498.8298.7098.8398.7198.8298.7098.8298.7098.8298.6298.7098.7098.8298.5098.7098.63
penbased99.9999.9999.9999.9999.9999.9999.9999.9999.9999.9999.9299.8799.9099.8799.9999.99
nursery88.0688.0088.0788.0188.0688.0088.0688.0088.0688.0087.9687.8089.9689.8587.8487.95
breast cancer98.9198.7098.9298.7198.9198.7098.9198.7098.9198.7098.7098.6098.5098.4398.3698.27
Z-Alizadeh Sani96.3096.1596.3196.1696.3096.1596.3096.1596.3096.1596.1696.0296.9897.0096.1296.01
Average98.3398.2298.3498.2398.3398.2298.3398.2298.3398.2298.2398.1598.7598.5998.2698.14
NORM.: Normalization, Accuracy: Acc, Precision: Pre, Recall: Rec F1-Score: F1, Specificity: Spe, Kappa: Kap, The best results for each metric are shown in the tables. Bold font indicates the best result obtained for each metric of the models.
Figure 12. ROC plots for the models from (az).
Figure 12. ROC plots for the models from (az).
Applsci 13 04006 g012aApplsci 13 04006 g012bApplsci 13 04006 g012cApplsci 13 04006 g012dApplsci 13 04006 g012e

5. Discussion

The classification performance usually drops and faces different failures in the presence of imbalanced datasets. However, imbalanced datasets exist in a broad range of real-life research. Hence, for imbalanced binary classification problems, samples are usually categorized into two classes, majority and minority. Generally, the minority class often illustrates the more significant and crucial samples and interests rather than the majority class samples. Nevertheless, compared to the minority class, the majority class has a larger amount of samples, and in some cases, the situation may be exceedingly serious. Therefore, handling these problems efficiently has become a crucial and significant topic in machine and deep learning methods.
To overcome these challenges, we proposed two methods that are based on DNN and CNN algorithms. At first, several classical and well-known undersampling and oversampling methods, such as RUS, Tomek Links, OSS, Near Miss, ROS, and SMOTE, were used in the data preprocessing procedure. Furthermore, to achieve better performance, we normalized current datasets. Then, we considered the focal loss function in the process of training the desired models, which are widely implemented in neural network frameworks for class imbalance problems. Moreover, due to the limited amount of datasets samples which causes unstable classification results, we have trained and evaluated our models for 100 runs and 2000 epochs. In the end, we analyzed our proposed models concerning the accuracy, precision, recall, F1-score, G-Mean, specificity, AUC, and Kappa as evaluation metrics. Based on 24 imbalanced datasets, the average performance score of the evaluation metrics of the executed models is indicated in Table 12.
According to Table 12, it can be founded that the mixed SMOTE + NORM + CNN model has the best performance with 99.08% accuracy, 99.09% precision, 99.08% sensitivity, 99.09% F1-score, 99.08% G-Mean, 99.03% specificity, 99.08% AUC, and 98.92% kappa. Furthermore, the comparison of performance metrics between our study and study [29] on the same imbalanced dataset is demonstrated in Table 13.
According to Table 13, the results show the efficiency of our proposed model performance on 16 imbalances that 99.00% recall, 99.00% G-Mean, and 98.98% F1-score attained by the SMOTE + NORM + CNN model. In addition, the results of the proposed SMOTE + NORM. + CNN model are compared with the related works on the Z-Alizadeh Sani dataset, as represented in Table 14.
Based on Table 14, the outcomes show the dominance of the Proposed SMOTE + NORM. + CNN model compared with other studies. The hybrid model of SMOTE + NORM. + CNN verifies the best performance with 98.57% accuracy, 98.58% recall, 98.57% F1-score, 98.58% precision, 98.42% specificity, and 99.14% AUC. In addition to what I have just explained, I can add that the F1-score metric is reputable with the imbalanced Z-Alizadeh Sani dataset because this metric shows the balance between recall and precision for classifiers.

6. Conclusions and Future Work

An unbalanced dataset of the majority and minority classes is a challenging issue when samples belonging to one or more classes are not evenly distributed. Especially, imbalanced datasets reasons deep learning-based models to obtain biased results for binary classification. To address this issue, we presented oversampling and undersampling techniques such as SMOTE, TL, OSS, NearMiss, ROS, and RUS. Among these techniques, SMOTE is the most common robust that targets the growth of the amount of minority class samples by generating synthetic samples, which is employed for balancing datasets with an extremely unbalanced ratio. In this study, six deep learning-based models were used to classify the majority and minority classes. We investigated SMOTE + NORM. + CNN/DNN, TL + NORM. + CNN/DNN, OSS + NORM. + CNN/DNN, NearMiss + NORM. + CNN/DNN, ROS + NORM. + CNN/DNN, and RUS + NORM. + CNN/DNN. To evaluate these models, we utilized KEEL, breast cancer, and Z-Alizadeh Sani datasets. The results show that the mixed SMOTE-NORM-CNN model significantly outperforms other models achieving 99.08% accuracy, 99.09% precision, 99.08% sensitivity, 99.09% F1-score, 99.08% G-Mean, 99.03% specificity, 99.08% AUC, and 98.92% kappa on 24 imbalanced datasets. Furthermore, the proposed model has been compared to the study [29], and the mixed model is suitable for the same dataset. Furthermore, we investigated the related methodologies on the Z-Alizadeh Sani dataset. The results indicate that our proposed methodology is more robust. In the future, we are planning to extend our technique to deal with multi-class problems. Moreover, we shall try to exploit different versions of SMOTE (e.g., SMOTE-Cov) with generative DL models, including Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN), to explore their effectiveness in comparison to our model on other real datasets. Our model’s major trade-off/limitation is that it has a longer training time and higher computational cost than traditional ML techniques.

Author Contributions

J.H.J. designed the study. J.H.J., A.M. and M.A.N. performed conceptualization and methodology. The software and analysis of data have been performed by A.M. Furthermore, J.H.J., A.M., M.A.N., S.S.O. and S.H. wrote the original draft and visualized the figures. The final manuscript has been reviewed and edited by S.H. and S.S.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

These links for datasets: https://sci2s.ugr.es/keel/imbalanced.php (accessed on 17 March 2023), https://www.kdd.org/kdd-cup (accessed on 17 March 2023), https://archive.ics.uci.edu/ml/datasets/Z-Alizadeh+Sani (accessed on 17 March 2023).

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Li, Y.; Zhang, J.; Zhang, S.; Xiao, W.; Zhang, Z. Multi-objective optimization-based adaptive class-specific cost extreme learning machine for imbalanced classification. Neurocomputing 2022, 496, 107–120. [Google Scholar] [CrossRef]
  2. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  3. Weiss, G.M. Mining with rarity: A unifying framework. ACM Sigkdd Explor. Newsl. 2004, 6, 7–19. [Google Scholar] [CrossRef]
  4. Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
  5. Mani, I.; Zhang, I. kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of the Workshop on Learning from Imbalanced Datasets (ICML 2003), Washington, DC, USA, 21 August 2003; pp. 1–7. [Google Scholar]
  6. Liu, W.; Chawla, S. Class confidence weighted knn algorithms for imbalanced data sets. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Shenzhen, China, 24–27 May 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 345–356. [Google Scholar]
  7. Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Cavtat-Dubrovnik, Croatia, 22–26 September 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 107–119. [Google Scholar]
  8. Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2009, 40, 185–197. [Google Scholar] [CrossRef]
  9. Provost, F. Machine learning from imbalanced data sets 101. In Proceedings of the AAAI’2000 Workshop on Imbalanced Data Sets, Austin, TX, USA, 31 July 2000; AAAI Press: Palo Alto, CA, USA, 2000; pp. 1–3. [Google Scholar]
  10. Sun, Y.; Kamel, M.S.; Wong, A.K.; Wang, Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit. 2007, 40, 3358–3378. [Google Scholar] [CrossRef]
  11. Liu, X.-Y.; Wu, J.; Zhou, Z.-H. Exploratory Undersampling for Class-Imbalance Learning. IEEE Trans. Syst. Man Cybern. Part B 2008, 39, 539–550. [Google Scholar] [CrossRef]
  12. Barandela, R.; Sánchez, J.S.; Garcıa, V. Rangel, Strategies for learning in class imbalance problems. Pattern Recognit. 2003, 36, 849–851. [Google Scholar] [CrossRef]
  13. Tahir, M.A.; Kittler, J.; Yan, F. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 2012, 45, 3738–3750. [Google Scholar] [CrossRef]
  14. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  15. García, S.; Herrera, F. Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy. Evol. Comput. 2009, 17, 275–306. [Google Scholar] [CrossRef] [PubMed]
  16. Weiss, G.M.; McCarthy, K.; Zabar, B. Cost-sensitive learning vs. sampling: Which is best for handling unbalanced classes with unequal error costs? Dmin 2007, 7, 24. [Google Scholar]
  17. Zhou, Z.-H.; Liu, X.-Y. Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 2005, 18, 63–77. [Google Scholar] [CrossRef]
  18. Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. A Comparative Study of Data Sampling and Cost Sensitive Learning. In Proceedings of the IEEE International Conference on Data Mining Workshops, Pisa, Italy, 5–19 December 2008; pp. 46–52. [Google Scholar] [CrossRef]
  19. Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
  20. Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. Icml 1996, 96, 148–156. [Google Scholar]
  21. Guo, H.; Viktor, H.L. Learning from imbalanced data sets with boosting and data generation: The databoost-im approach. ACM Sigkdd Explor. Newsl. 2004, 6, 30–39. [Google Scholar] [CrossRef]
  22. Hido, S.; Kashima, H.; Takahashi, Y. Roughly balanced bagging for imbalanced data. Stat. Anal. Data Min. ASA Data Sci. J. 2009, 2, 412–426. [Google Scholar] [CrossRef]
  23. Durahim, A.O. Comparison of sampling techniques for imbalanced learning. Yönet. Bilişim Sist. Derg. 2016, 2, 181–191. [Google Scholar]
  24. Tomek, I. Two Modifications of CNN. IEEE Trans. Syst. Man Cybern. 1976, SMC-6, 769–772. [Google Scholar] [CrossRef] [Green Version]
  25. Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One-sided selection. Icml 1997, 97, 179. [Google Scholar]
  26. Azadbakht, M.; Fraser, C.S.; Khoshelham, K. Synergy of sampling techniques and ensemble classifiers for classification of urban environments using full-waveform LiDAR data. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 277–291. [Google Scholar] [CrossRef]
  27. Czarnowski, I. Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams. J. Comput. Sci. 2022, 61, 101614. [Google Scholar] [CrossRef]
  28. Chen, Q.; Zhang, Z.-L.; Huang, W.-P.; Wu, J.; Luo, X.-G. PF-SMOTE: A novel parameter-free SMOTE for imbalanced datasets. Neurocomputing 2022, 498, 75–88. [Google Scholar] [CrossRef]
  29. Mayabadi, S.; Saadatfar, H. Two density-based sampling approaches for imbalanced and overlapping data. Knowl.-Based Syst. 2022, 241, 108217. [Google Scholar] [CrossRef]
  30. Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar] [CrossRef] [Green Version]
  31. Li, K.; Zhou, G.; Zhai, J.; Li, F.; Shao, M. Improved PSO_AdaBoost Ensemble Algorithm for Imbalanced Data. Sensors 2019, 19, 1476. [Google Scholar] [CrossRef] [Green Version]
  32. Vuttipittayamongkol, P.; Elyan, E.; Petrovski, A. On the class overlap problem in imbalanced data classification. Knowl.-Based Syst. 2020, 212, 106631. [Google Scholar] [CrossRef]
  33. Aridas, C.K.; Karlos, S.; Kanas, V.G.; Fazakis, N.; Kotsiantis, S.B. Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets. IEEE Access 2019, 8, 2122–2133. [Google Scholar] [CrossRef]
  34. Dablain, D.; Krawczyk, B.; Chawla, N.V. DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data. IEEE Trans. Neural Networks Learn. Syst. 2022, 1–15. [Google Scholar] [CrossRef]
  35. Bagui, S.; Li, K. Resampling imbalanced data for network intrusion detection datasets. J. Big Data 2021, 8, 6. [Google Scholar] [CrossRef]
  36. Choi, H.-S.; Jung, D.; Kim, S.; Yoon, S. Imbalanced Data Classification via Cooperative Interaction Between Classifier and Generator. IEEE Trans. Neural Networks Learn. Syst. 2021, 33, 3343–3356. [Google Scholar] [CrossRef] [PubMed]
  37. Xie, X.; Liu, H.; Zeng, S.; Lin, L.; Li, W. A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl.-Based Syst. 2020, 213, 106689. [Google Scholar] [CrossRef]
  38. Zheng, M.; Li, T.; Sun, L.; Wang, T.; Jie, B.; Yang, W.; Tang, M.; Lv, C. An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification. Knowl.-Based Syst. 2021, 216, 106800. [Google Scholar] [CrossRef]
  39. Elyan, E.; Moreno-Garcia, C.F.; Jayne, C. CDSMOTE: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Comput. Appl. 2020, 33, 2839–2851. [Google Scholar] [CrossRef]
  40. Asniar; Maulidevi, N.U.; Surendro, K. SMOTE-LOF for noise identification in imbalanced data classification. J. King Saud Univ. Comput. Inf. Sci. 2021, 34, 3413–3423. [Google Scholar] [CrossRef]
  41. Abdoli, M.; Akbari, M.; Shahrabi, J. Bagging Supervised Autoencoder Classifier for credit scoring. Expert Syst. Appl. 2023, 213, 118991. [Google Scholar] [CrossRef]
  42. El Bakrawy, L.M.; Cifci, M.A.; Kausar, S.; Hussain, S.; Islam, A.; Alatas, B.; Desuky, A.S. A Modified Ant Lion Optimization Method and Its Application for Instance Reduction Problem in Balanced and Imbalanced Data. Axioms 2022, 11, 95. [Google Scholar] [CrossRef]
  43. Yang, M.; Wang, Z.; Li, Y.; Zhou, Y.; Li, D.; Du, W. Gravitation balanced multiple kernel learning for imbalanced classification. Neural Comput. Appl. 2022, 34, 13807–13823. [Google Scholar] [CrossRef]
  44. Tanimoto, A.; Yamada, S.; Takenouchi, T.; Sugiyama, M.; Kashima, H. Improving imbalanced classification using near-miss instances. Expert Syst. Appl. 2022, 201, 117130. [Google Scholar] [CrossRef]
  45. Thejas, G.S.; Hariprasad, Y.; Iyengar, S.; Sunitha, N.; Badrinath, P.; Chennupati, S. An extension of Synthetic Minority Oversampling Technique based on Kalman filter for imbalanced datasets. Mach. Learn. Appl. 2022, 8, 100267. [Google Scholar] [CrossRef]
  46. Wei, G.; Mu, W.; Song, Y.; Dou, J. An improved and random synthetic minority oversampling technique for imbalanced data. Knowl.-Based Syst. 2022, 248, 108839. [Google Scholar] [CrossRef]
  47. Gao, Y.; Gao, L.; Li, X.; Cao, S. A Hierarchical Training-Convolutional Neural Network for Imbalanced Fault Diagnosis in Complex Equipment. IEEE Trans. Ind. Inform. 2022, 18, 8138–8145. [Google Scholar] [CrossRef]
  48. Mohammed, R.; Rawashdeh, J.; Abdullah, M. Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results. In Proceedings of the 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 7–9 April 2020; pp. 243–248. [Google Scholar] [CrossRef]
  49. Li, W.; Chen, J.; Cao, J.; Ma, C.; Wang, J.; Cui, X.; Chen, P. EID-GAN: Generative Adversarial Nets for Extremely Imbalanced Data Augmentation. IEEE Trans. Ind. Inform. 2022, 19, 3208–3218. [Google Scholar] [CrossRef]
  50. Zieba, M.; Tomczak, J.M. Boosted SVM with active learning strategy for imbalanced data. Soft Comput. 2014, 19, 3357–3368. [Google Scholar] [CrossRef] [Green Version]
  51. He, H.; Zhang, W.; Zhang, S. A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl. 2018, 98, 105–117. [Google Scholar] [CrossRef]
  52. Li, D.-C.; Wang, S.-Y.; Huang, K.-C.; Tsai, T.-I. Learning class-imbalanced data with region-impurity synthetic minority oversampling technique. Inf. Sci. 2022, 607, 1391–1407. [Google Scholar] [CrossRef]
  53. Fernandez, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
  54. Pereira, R.M.; Costa, Y.M.; Silla, C.N., Jr. MLTL: A multi-label approach for the Tomek Link undersampling algorithm. Neurocomputing 2019, 383, 95–105. [Google Scholar] [CrossRef]
  55. Hernandez, J.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. An Empirical Study of Oversampling and Undersampling for Instance Selection Methods on Imbalance Datasets. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Proceedings of the 18th Iberoamerican Congress, CIARP 2013, Havana, Cuba, 20–23 November 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 262–269. [Google Scholar] [CrossRef] [Green Version]
  56. Kamei, Y.; Monden, A.; Matsumoto, S.; Kakimoto, T.; Matsumoto, K.-I. The effects of over and under sampling on fault-prone module detection. In Proceedings of the First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007), Madrid, Spain, 20–21 September 2007; pp. 196–204. [Google Scholar]
  57. More, A. Survey of resampling techniques for improving classification performance in unbalanced datasets. arXiv 2016, arXiv:1608.06048. [Google Scholar]
  58. Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
  59. Caterini, A.L.; Chang, D.E. Deep Neural Networks in a Mathematical Framework; Springer International Publishing: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
  60. Pal, S.K.; Mitra, S. Multilayer Perceptron, Fuzzy Sets, Classification. IEEE Trans. Neural Netw. 1992, 3, 683–697. [Google Scholar] [CrossRef] [PubMed]
  61. Guo, Y.; Du, G.-Q.; Shen, W.-Q.; Du, C.; He, P.-N.; Siuly, S. Automatic myocardial infarction detection in contrast echocardiography based on polar residual network. Comput. Methods Programs Biomed. 2020, 198, 105791. [Google Scholar] [CrossRef]
  62. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar]
  63. O’Shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
  64. Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
  65. Mulyanto, M.; Faisal, M.; Prakosa, S.W.; Leu, J.-S. Effectiveness of Focal Loss for Minority Classification in Network Intrusion Detection Systems. Symmetry 2020, 13, 4. [Google Scholar] [CrossRef]
  66. Alcalá-Fdez, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2011, 17, 1–36. [Google Scholar]
  67. Joloudari, J.H.; Azizi, F.; Nematollahi, M.A.; Alizadehsani, R.; Hassannatajjeloudari, E.; Nodehi, I.; Mosavi, A. GSVMA: A Genetic Support Vector Machine ANOVA Method for CAD Diagnosis. Front. Cardiovasc. Med. 2022, 8, 2178. [Google Scholar] [CrossRef]
  68. Li, J.; Fong, S.; Zhuang, Y. Optimizing SMOTE by metaheuristics with neural network and decision tree. In Proceedings of the 3rd International Symposium on Computational and Business Intelligence (ISCBI), Bali, Indonesia, 7–9 December 2015; pp. 26–32. [Google Scholar]
  69. Chowdary, M.K.; Nguyen, T.N.; Hemanth, D.J. Deep learning-based facial emotion recognition for human–computer interaction applications. Neural Comput. Appl. 2021, 1–18. [Google Scholar] [CrossRef]
  70. Narkhede, S. Understanding auc roc curve. Towards Data Sci. 2018, 26, 220–227. [Google Scholar]
  71. Zhang, S.; Yuan, Y.; Yao, Z.; Wang, X.; Lei, Z. Improvement of the Performance of Models for Predicting Coronary Artery Disease Based on XGBoost Algorithm and Feature Processing Technology. Electronics 2022, 11, 315. [Google Scholar] [CrossRef]
  72. Alizadehsani, R.; Hosseini, M.J.; Sani, Z.A.; Ghandeharioun, A.; Boghrati, R. Diagnosis of coronary artery disease using cost-sensitive algorithms. In Proceedings of the 12th International Conference on Data Mining Workshops, Brussels, Belgium, 10 December 2012; pp. 9–16. [Google Scholar]
  73. Alizadehsani, R.; Habibi, J.; Hosseini, M.J.; Boghrati, R.; Ghandeharioun, A.; Bahadorian, B.; Sani, Z.A. Diagnosis of coronary artery disease using data mining techniques based on symptoms and ecg features. Eur. J. Sci. Res. 2012, 82, 542–553. [Google Scholar]
  74. Alizadehsani, R.; Habibi, J.; Hosseini, M.J.; Mashayekhi, H.; Boghrati, R.; Ghandeharioun, A.; Bahadorian, B.; Sani, Z.A. A data mining approach for diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 2013, 111, 52–61. [Google Scholar] [CrossRef]
  75. Babič, F.; Olejár, J.; Vantová, Z.; Paralič, J. Predictive and descriptive analysis for heart disease diagnosis. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), Prague, Czech Republic, 3–6 September 2017; pp. 155–163. [Google Scholar]
  76. Arabasadi, Z.; Alizadehsani, R.; Roshanzamir, M.; Moosaei, H.; Yarifard, A.A. Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm. Comput. Methods Programs Biomed. 2017, 141, 19–26. [Google Scholar] [CrossRef]
  77. Li, H.; Wang, X.; Li, Y.; Qin, C.; Liu, C. Comparison between medical knowledge based and computer automated feature selection for detection of coronary artery disease using imbalanced data. In Proceedings of the BIBE 2018, International Conference on Biological Information and Biomedical Engineering, Shanghai, China, 6–8 July 2018; pp. 1–4. [Google Scholar]
  78. Abdar, M.; Acharya, U.R.; Sarrafzadegan, N.; Makarenkov, V. NE-nu-SVC: A New Nested Ensemble Clinical Decision Support System for Effective Diagnosis of Coronary Artery Disease. IEEE Access 2019, 7, 167605–167620. [Google Scholar] [CrossRef]
  79. Abdar, M.; Książek, W.; Acharya, U.R.; Tan, R.-S.; Makarenkov, V.; Pławiak, P. A new machine learning technique for an accurate diagnosis of coronary artery disease. Comput. Methods Programs Biomed. 2019, 179, 104992. [Google Scholar] [CrossRef]
  80. Khan, Y.; Qamar, U.; Asad, M.; Zeb, B. Applying Feature Selection and Weight Optimization Techniques to Enhance Artificial Neural Network for Heart Disease Diagnosis. In Intelligent Systems and Applications, Proceedings of the 2019 Intelligent Systems Conference (IntelliSys), London, UK, 5–6 September 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 340–351. [Google Scholar] [CrossRef]
  81. Kolukısa, B.; Hacılar, H.; Kuş, M.; Bakır-Güngör, B.; Aral, A.; Güngör, V.Ç. Diagnosis of coronary heart disease via classification algorithms and a new feature selection methodology. Int. J. Data Min. Sci. 2019, 1, 8–15. [Google Scholar]
  82. Nasarian, E.; Abdar, M.; Fahami, M.A.; Alizadehsani, R.; Hussain, S.; Basiri, M.E.; Zomorodi-Moghadam, M.; Zhou, X.; Pławiak, P.; Acharya, U.R.; et al. Association between work-related features and coronary artery disease: A heterogeneous hybrid feature selection integrated with balancing approach. Pattern Recognit. Lett. 2020, 133, 33–40. [Google Scholar] [CrossRef]
  83. Shahid, A.H.; Singh, M. A Novel Approach for Coronary Artery Disease Diagnosis using Hybrid Particle Swarm Optimization based Emotional Neural Network. Biocybern. Biomed. Eng. 2020, 40, 1568–1585. [Google Scholar] [CrossRef]
  84. Ghiasi, M.M.; Zendehboudi, S.; Mohsenipour, A.A. Decision tree-based diagnosis of coronary artery disease: CART model. Comput. Methods Programs Biomed. 2020, 192, 105400. [Google Scholar] [CrossRef]
  85. Joloudari, J.H.; Joloudari, E.H.; Saadatfar, H.; Ghasemigol, M.; Razavi, S.M.; Mosavi, A.; Nabipour, N.; Shamshirband, S.; Nadai, L. Coronary Artery Disease Diagnosis; Ranking the Significant Features Using a Random Trees Model. Int. J. Environ. Res. Public Health 2020, 17, 731. [Google Scholar] [CrossRef] [Green Version]
  86. Zomorodi-Moghadam, M.; Abdar, M.; Davarzani, Z.; Zhou, X.; Pławiak, P.; Acharya, U. Hybrid particle swarm optimization for rule discovery in the diagnosis of coronary artery disease. Expert Syst. 2019, 38, e12485. [Google Scholar] [CrossRef]
  87. Ashish, L.; Kumar, S.; Yeligeti, S. Ischemic heart disease detection using support vector Machine and extreme gradient boosting method. Mater. Today Proc. 2021. [Google Scholar] [CrossRef]
  88. Gupta, A.; Kumar, R.; Arora, H.S.; Raman, B. C-CADZ: Computational intelligence system for coronary artery disease detection using Z-Alizadeh Sani dataset. Appl. Intell. 2021, 52, 2436–2464. [Google Scholar] [CrossRef]
Figure 1. The schematic diagram of several data distributions in various scenarios with two-dimensional binary-class-imbalanced data. (a) Imbalanced Ratio, (b) Class Surrunding, (c) Class Overlapping, (d) Small Disjucts.
Figure 1. The schematic diagram of several data distributions in various scenarios with two-dimensional binary-class-imbalanced data. (a) Imbalanced Ratio, (b) Class Surrunding, (c) Class Overlapping, (d) Small Disjucts.
Applsci 13 04006 g001
Figure 2. The framework overview of our proposed methodology in this study.
Figure 2. The framework overview of our proposed methodology in this study.
Applsci 13 04006 g002
Figure 3. Illustration of Random Over-Sampling technique. This algorithm randomly selects instances from the minority class, with replacement, and adds them to the training dataset.
Figure 3. Illustration of Random Over-Sampling technique. This algorithm randomly selects instances from the minority class, with replacement, and adds them to the training dataset.
Applsci 13 04006 g003
Figure 4. Illustration of SMOTE Oversampling for Imbalanced Classification. By interpolating between samples of the minority class that is close to one another, SMOTE generates new samples from the minority class.
Figure 4. Illustration of SMOTE Oversampling for Imbalanced Classification. By interpolating between samples of the minority class that is close to one another, SMOTE generates new samples from the minority class.
Applsci 13 04006 g004
Figure 5. Illustration of Random Under-Sampling technique. This method selects examples from the majority class at random and removes them from the training dataset.
Figure 5. Illustration of Random Under-Sampling technique. This method selects examples from the majority class at random and removes them from the training dataset.
Applsci 13 04006 g005
Figure 6. Illustration of Tomek Links. The distance between the two classes widens as the samples of the majority class of the pairs forming TLs are eliminated, resulting in a more evenly distributed dataset. (a) Original Distribution of Data, (b) Providing the Tomek Links, (c) Removing the links.
Figure 6. Illustration of Tomek Links. The distance between the two classes widens as the samples of the majority class of the pairs forming TLs are eliminated, resulting in a more evenly distributed dataset. (a) Original Distribution of Data, (b) Providing the Tomek Links, (c) Removing the links.
Applsci 13 04006 g006
Figure 7. Illustration of One-Sided Selection. Through a modification of the condensed closest neighbor rule, this method preserves the entirety of the minority class samples while eliminating the redundant majority class samples.
Figure 7. Illustration of One-Sided Selection. Through a modification of the condensed closest neighbor rule, this method preserves the entirety of the minority class samples while eliminating the redundant majority class samples.
Applsci 13 04006 g007
Figure 8. Illustration of NearMiss. This method refers to a collection of undersampling algorithms that select instances based on the distance of majority-class samples to minority-class ones.
Figure 8. Illustration of NearMiss. This method refers to a collection of undersampling algorithms that select instances based on the distance of majority-class samples to minority-class ones.
Applsci 13 04006 g008
Figure 9. The architecture of the proposed DNN-based classifier.
Figure 9. The architecture of the proposed DNN-based classifier.
Applsci 13 04006 g009
Figure 10. The architecture of the proposed CNN-based classifier.
Figure 10. The architecture of the proposed CNN-based classifier.
Applsci 13 04006 g010
Table 1. The implementation details.
Table 1. The implementation details.
Programming LanguagePython 3.9
Deep Learning LibraryPyTorch 1.9
CPUIntel® Core™ i7-10700 CPU @ 2.90 GHz × 16
GPUNVIDIA Corporation GP104 [GeForce GTX 1070]
RAM64 GB
Table 2. The list of hyperparameters of the DNN model.
Table 2. The list of hyperparameters of the DNN model.
LayerLayer TypeInput FeaturesOut Features
1Linear (Dense)Input shape64
2ReluN/AN/A
3batch normalization64-
4Linear (Dense)6464
5ReluN/AN/A
6batch normalization64-
7dropoutrate: 0.3-
8Linear (Dense)641
N/A: Not applicable.
Table 3. The list of hyperparameters of the CNN-based model.
Table 3. The list of hyperparameters of the CNN-based model.
LayerLayer TypeInput ChannelsOutput ChannelsKernel SizeStride
1Convolution11631
2ReluN/AN/AN/AN/A
3Convolution16421
4ReluN/AN/AN/AN/A
5Linear (Dense)1650N/AN/A
6ReluN/AN/AN/AN/A
7Linear (Dense)161N/AN/A
N/A: Not applicable.
Table 4. The description of the parameters for oversampling techniques.
Table 4. The description of the parameters for oversampling techniques.
Name of ParameterDescriptionChoice
Sampling strategyResample which one of the classesAll
Random stateSet a fixed state to reproduce the same distribution of the dataNone
K NeighborsCorresponding to the number of neighbors to use for synthesizing new data10
Number of JobsCorresponding to the number of cores of Central Processing Unit (CPU) for useNone
Number of featuresCorresponding to the number of input featuresBased on the number of features existent in each dataset
Table 5. Datasets description in detail.
Table 5. Datasets description in detail.
No.NameAttributeAll SamplesImbalanced Ratio
1Wisconsin96831.86
2Pima87681.87
3iris041502.00
4glass092142.06
5glass192141.82
6glass692146.38
7yeast1814842.46
8Haberman33062.78
9vehicle1188462.90
10vehicle2188462.88
11vehicle3188462.99
12ecoli173363.36
13ecoli273365.46
14ecoli373368.6
15new-thyroid152155.14
16new-thyroid252155.14
17segment01923086.02
18yeast3814848.10
19page-blocks01054728.79
20yeast-2_vs_485149.08
21penbased1010,9929.41
22Nursery512,6902.2
23breast cancer117102,294163.2
24Z-Alizadeh Sani553032.48
Table 12. The average performance score of the evaluation metrics of the executed models.
Table 12. The average performance score of the evaluation metrics of the executed models.
ModelsAccPreSenF1G-MeanSpeAUCKap
TL + NORM. + CNN98.9298.9398.9298.9298.9298.9099.0798.87
TL + NORM. + DNN98.9598.9098.9098.9098.9098.8798.9898.79
OSS + NORM. + CNN98.7898.7998.7898.7898.7898.7298.9398.73
OSS + NORM. + DNN98.8298.8398.8298.8298.8298.8098.9498.72
NearMiss + NORM. + CNN98.9898.9898.9898.9898.9898.9299.0798.89
NearMiss + NORM. + DNN99.0199.0299.0199.0199.0198.9598.9998.89
RUS + NORM. + CNN98.3398.3498.3398.3398.3398.2398.7598.26
RUS + NORM. + DNN98.2298.2398.2298.2298.2298.1598.5998.14
SMOTE + NORM. + CNN99.0899.0999.0899.0999.0899.0399.0898.92
SMOTE + NORM. + DNN98.9699.0098.9798.9898.9898.9999.0298.78
Boldface specifies that SMOTE + NORM. + CNN is the most robust model.
Table 13. The comparison of performance metrics on the same dataset.
Table 13. The comparison of performance metrics on the same dataset.
StudyMethodRec (%)G-Nean (%)F1 (%)
Mayabadi and Saadatfar, [29]DB_HS + SVM95.8088.3081.40
DB_HS + RF98.1092.0083.80
DB_US + SVM92.7088.5081.50
DB_US + RF95.6093.8087.90
Current
Study
SMOTE + NORM + CNN99.0099.0098.98
Table 14. The comparison of the metrics results between the proposed study and related studies on the Z-Alizadeh Sani dataset.
Table 14. The comparison of the metrics results between the proposed study and related studies on the Z-Alizadeh Sani dataset.
StudyMethodAcc (%)Rec (%)F1 (%)Pre (%)Spe (%)AUC (%)
Alizadehsani et al. 2012 [72]Sequential Minimal Optimization92.0997.22NCNC79.31NC
Alizadehsani et al. 2012 [73]Ensemble of Naïve Bayes and Sequential Minimal Optimization88.5291.12NCNC82.05NC
Alizadehsani et al. 2013 [74]Information gain + Sequential Minimal Optimization94.0896.30NCNC88.51NC
Babič et al., 2017 [75]Suppoort vector machine86.67NCNCNCNCNC
Arabasadi et al., 2017 [76]Neural network + Genetic algorithm93.8597.00NCNC92.00NC
Li et al., 2018 [77]Naïve Bayes + Genetic algorithm88.1688.00NCNC87.78NC
Abdar et al., 2019 [78]nested ensemble nu-Support Vector Classification + genetic search algorithm + multi-step data balancing94.6694.7094.7094.70NC96.60
Abdar et al., 2019 [79]N2Genetic optimizer-nuSupport Vector Machine93.08NC91.51NCNCNC
Khan et al., 2019 [80]Neural network + Gini Index for feature selection + Backward Weight Optimization88.49NCNCNCNCNC
Kolukısa et al., 2019 [81]Ensemble Classifier with Fisher Linear Discriminant Analysis92.0794.0094.40NC87.4095.30
Nasarian et al., 2020 [82]Heterogeneous hybrid feature selection algorithm + SMOTE + Extreme gradient boosting92.5892.9990.6292.59NCNC
Shahid and Singh, 2020 [83]Hybrid Particle Swarm Optimization-emotional neural networks coupled with feature selection88.3491.8592.1292.3778.98NC
Ghiasi et al., 2020 [84]Classification and Regression tree92.4198.61NCNC77.01NC
Joloudari et al., 2020 [85]Random trees91.47NCNCNCNC96.70
Zomorodi-moghadam et al., 2021 [86]Hybrid Particle Swarm Optimization84.25NCNCNCNCNC
Ashish et al., 2021 [87]Support Vector Machine -Extreme gradient boosting + Random forest93.86NC91.8693.86NCNC
Zhang et al., 2022 [71]Extreme gradient boosting + Feature construction + SMOTE94.7096.1094.6093.4093.2098.00
Gupta et al., 2022 [88]Fixed analysis of mixed data + Binary Bat Algorithm + Ensemble of Random Forest and Extra Trees97.3798.1598.15NC95.4596.80
The proposed studyMixed SMOTE-NORM.-CNN98.5798.5898.5798.5898.4299.14
NC: Not considered.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Joloudari, J.H.; Marefat, A.; Nematollahi, M.A.; Oyelere, S.S.; Hussain, S. Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Appl. Sci. 2023, 13, 4006. https://doi.org/10.3390/app13064006

AMA Style

Joloudari JH, Marefat A, Nematollahi MA, Oyelere SS, Hussain S. Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks. Applied Sciences. 2023; 13(6):4006. https://doi.org/10.3390/app13064006

Chicago/Turabian Style

Joloudari, Javad Hassannataj, Abdolreza Marefat, Mohammad Ali Nematollahi, Solomon Sunday Oyelere, and Sadiq Hussain. 2023. "Effective Class-Imbalance Learning Based on SMOTE and Convolutional Neural Networks" Applied Sciences 13, no. 6: 4006. https://doi.org/10.3390/app13064006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop