Cost-sensitive Bayesian network classifiers

doi:10.1016/j.patrec.2014.04.017

Pattern Recognition Letters

Volume 45, 1 August 2014, Pages 211-216

https://doi.org/10.1016/j.patrec.2014.04.017 Get rights and content

Highlights

•
Cost-sensitive learning has received increased attention in recent years.
•
Most of the existing works are devoted to make decision trees cost-sensitive.
•
We propose cost-sensitive Bayesian network classifiers.
•
The experimental results validate their effectiveness.

Abstract

Cost-sensitive learning has received increased attention in recent years. However, in existing studies, most of the works are devoted to make decision trees cost-sensitive and very few works discuss cost-sensitive Bayesian network classifiers. In this paper, an instance weighting method is incorporated into various Bayesian network classifiers. The probability estimation of Bayesian network classifiers is modified by the instance weighting method, which makes Bayesian network classifiers cost-sensitive. The experimental results on 36 UCI data sets show that when cost ratio is large, the cost-sensitive Bayesian network classifiers perform well in terms of the total misclassification costs and the number of high cost errors. When cost ratio is small, the advantage of cost-sensitive Bayesian network classifiers is not so obvious in terms of the total misclassification costs, but still obvious in terms of the number of high cost errors, compared to the original cost-insensitive Bayesian network classifiers.

Introduction

Classification is one of the most important tasks in data mining and machine learning. Traditional data mining and machine learning algorithms [19] are designed to yield classifiers that minimize the number of misclassification errors. In this case, the costs for different misclassification errors are considered to be equal. However, in real-world domains, many practical classification problems have different costs associated with different types of error. For example, in medical diagnosis, the cost of misclassifying a cancer patient to be a healthy person is significantly greater than the opposite type of error. So it is important to create a classifier that minimizes the total misclassification costs rather than the number of misclassification errors. This kind of classification task is called cost-sensitive classification [8], [16], [31]. In recent years, cost-sensitive classification has received increased attention.

In existing studies, most of the works are devoted to make decision trees cost-sensitive. A detailed survey of cost-sensitive decision tree induction algorithms can be found in Lomax and Vadera’s paper [18]. However, a comprehensive study of cost-sensitive Bayesian network classifiers is rare in existing studies. Only a few works make naive Bayesian network classifiers cost-sensitive. For example, Gama [12] presents a cost-sensitive iterative Bayes. For another example, Chai et al. [1] specifically consider test-cost sensitive learning and propose a test-cost sensitive naive Bayes. For the third example, Fang [9] develops a cost-sensitive naive Bayes method which learns and infers the order relation from the training data and classifies the instance based on the inferred order relation.

In this paper, we focus our attention on cost-sensitive Bayesian network classifiers. In existing cost-sensitive studies, some meta-learning methods, such as MetaCost [6], instance weighting [23], thresolding [21], and sampling [13] etc., can be applied to make Bayesian network classifiers cost-sensitive. Among these, instance weighting is a simple, easy to understand and efficient method. Inspired by the success of cost-sensitive C4.5 [23] and weighted random forest [3], in this paper, the instance weighting method is incorporated into various Bayesian network classifiers, such as naive Bayes (NB), tree augmented Bayesian networks (TAN) [10], averaged one dependence estimators (AODE) [25], and hidden naive Bayes (HNB) [14], to make these Bayesian network classifiers cost-sensitive. The resulting classifiers are called cost-sensitive Bayesian network classifiers in this paper. The experimental results on a large number of UCI data sets show that these cost-sensitive Bayesian network classifiers achieve a substantial reduction in the total misclassification costs and the number of high cost errors.

The rest of this paper is organized as follows. In Section 2, some works related to cost-sensitive learning are introduced. In Section 3, we revisit several state-of-the-art Bayesian network classifiers and then incorporate the instance weighting method into them. In Section 4, we conduct a series of experiments on 36 UCI benchmark data sets to validate our proposed cost-sensitive Bayesian network classifiers. In Section 5, we draw conclusions and outline the main directions for our future work.

Section snippets

Related work

Cost-sensitive learning is generally categorized into two categories [16]. One is the direct method, which is to directly introduce and utilize misclassification costs into the learning algorithms. As a result, the learning algorithms are cost-sensitive in themselves. Some direct cost-sensitive learning algorithms include ICET [24], cost-sensitive iterative naive Bayes [12], and cost-sensitive decision trees [7], [17].

The other category is called cost-sensitive meta-learning method.

The instance weighting method in cost-sensitive C4.5

The central choice in the decision tree induction is selecting which attribute to split the training data at each non-terminal node in the tree. The information gain measure and its variants are generally used to select attributes. No matter the information gain measure or its variants are based on a measure called entropy [19]. Given a node t of a decision tree, let $N (t)$ be the number of instances in node t and $N_{j} (t)$ be the number of class j instances in node t, and the entropy of node t is

Data sets

We ran our experiments on the whole 36 UCI data sets published on the main web site of Weka platform [27], which represent a wide range of domains and data characteristics. The original description of these 36 data sets can be found from our previous papers [14], [15]. In our experiments, the following 4 processing steps are adopted.

1.
Replacing missing attribute values: The unsupervised filter named ReplaceMissingValues in Weka is used to replace all missing attribute values in each data set.
2.

Conclusions and future work

In real-world applications, cost-sensitive learning has received increased attention. Traditional Bayesian network classifiers are designed to minimize the misclassification errors. When they are applied to cost-sensitive learning tasks, the performance is generally poor. In order to scale up their performance, we incorporate the instance weighing method into various Bayesian network classifiers and propose cost-sensitive Bayesian network classifiers in this paper. The instance weighting method

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was partially supported by the National Natural Science Foundation of China (61203287), the Program for New Century Excellent Talents in University (NCET-12-0953), the Provincial Natural Science Foundation of Hubei (2011CDA103), and the Fundamental Research Funds for the Central Universities (CUG130504, CUG130414).

References (31)

M. Galar et al.
Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling
Pattern Recogn.
(2013)
Y. Sun et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recogn.
(2007)
X. Chai, L. Deng, Q. Yang, C.X. Ling, Test-cost sensitive naive bayes classification, in: Fourth IEEE International...
N.V. Chawla et al.
Smote: synthetic minority over-sampling technique
J. Artif. Intell. Res.
(2002)
C. Chen, A. Liaw, L. Breiman, Using random forest to learn imbalanced data, University of California Berkeley,...
D.M. Chickering, Learning bayesian networks is np-complete, in: Learning from Data, 1996, pp....
C.K. Chow et al.
Approximating discrete probability distributions with dependence trees
IEEE Trans. Inf. Theory
(1968)
P. Domingos
Metacost: a general method for making classifiers cost-sensitive
C. Drummond, R. Holte, Exploiting the cost (in)sensitivity of decision tree splitting criteria, in: Proceedings of the...
C. Elkan
The foundations of cost-sensitive learning

X. Fang

Inference-based naive bayes: turning nave bayes cost-sensitive

IEEE Trans. Knowl. Data Eng.

(2013)

N. Friedman et al.

Bayesian network classifiers

Mach. Learn.

(1997)

J. Gama, A cost-sensitive iterative bayes, in: Workshop on Cost-Sensitive Learning at the Seventeenth International...

L. Jiang, C. Li, Z. Cai, H. Zhang, Sampled bayesian network classifiers for class-imbalance and cost-sensitive...

L. Jiang et al.

A novel bayes model: hidden naive bayes

IEEE Trans. Knowl. Data Eng.

(2009)

Cited by (61)

Extended natural neighborhood for SMOTE and its variants in imbalanced classification
2023, Engineering Applications of Artificial Intelligence
Imbalanced data classification is a challenging issue encountered in many practical applications. Synthetic minority oversampling technique (SMOTE) and its variants are popular resampling methods. However, in most of these methods, the neighborhood determined by $k$ -nearest neighbor ( $k$ NN) cannot reflect the local distribution precisely, leading to the generation of noisy examples. To solve this problem, we propose a neighborhood concept without parameter $k$ called extended natural neighbor (ENaN), which is derived from natural neighbor (NaN). ENaN unites $k$ NN and reverse $k$ NN to determine neighbors adaptively according to the sample distribution. Compared to NaN, ENaN explores broad neighborhoods, which facilitates to improve the quality of generated examples. ENaN-based SMOTE (ENaNSMOTE) can improve the sample distribution obtained by SMOTE and NaNSMOTE. Extensive experiments using 30 synthetic and 20 real-world datasets prove the effectiveness of ENaN in SMOTE and its variants.
Maximizing AUC to learn weighted naive Bayes for imbalanced data classification
2023, Expert Systems with Applications
Imbalanced data classification is a challenging problem frequently encountered in many real-world applications. Traditional classification algorithms are generally designed to maximize overall accuracy; therefore, their effectiveness tends to be impeded by imbalanced data. Similar to other traditional classifiers, naive Bayes (NB) sometimes fails at predicting minority instances owing to its sensitivity to class distribution. To cope with this challenge, we proposed RankOptAUC NB (RNB), a novel attribute weighting method for the NB. In the proposed method, learning a weighted NB classifier was formulated as a nonlinear optimization problem with the objective of maximizing the area under the ROC (AUC). The optimization formulation enabled the RNB method to select important variables by simply adding a regularization term to the objective function. We also provided theoretical evidence that, based on the AUC metric, the proposed method improved the performance of a weighted NB classifier. The results of numerical experiments conducted using 30 real-world datasets proved that the proposed scheme successfully determined the optimal attribute weights for imbalanced data classification.
An ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network
2023, Chemometrics and Intelligent Laboratory Systems
Currently, an increasing number of real-world applications show characteristics of class-imbalance classification suffering from severe class distribution skewing, thus requiring brand new algorithms to learn from imbalanced datasets. In this paper, a novel oversampling method using GAN framework is proposed for numerical imbalanced data, namely G-GAN. In the method, a Gaussian distribution of minority samples is estimated to get prior knowledge of minority class for the latent space of GAN. In order to increase the randomness of the generated samples, noises are obtained by a mixed strategy, that is, some noises of generator obey Gaussian distribution and others obey random distribution. Then G-GAN is trained to generate dispersive positive samples with the idea of Bagging, which could avoid the occurrence of overfitting. G-GAN is different from other literatures in that GAN does not directly generate minority samples, but adds the distribution information of minority samples to the latent space of GAN, and then generates minority samples. Compared with 11 commonly used oversampling methods, G-GAN obtains promising results in terms of G-mean, AUC, F-measure and ROC utilizing three classifiers on 11 benchmark imbalanced datasets. Furthermore, G-GAN is also validated on AUC metrics of a real Diabetes imbalanced dataset. The results demonstrate that G-GAN can provide great potential for imbalanced classification in the two numerical experiments.
Cost-sensitive matrixized classification learning with information entropy
2022, Applied Soft Computing
Classifier design is one of the most significant fields in pattern recognition. Most classifiers are measured by classification accuracy, which assumes that all the misclassification cost are the same. In the real world, different misclassifications usually bring different losses. Based on this fact, cost-sensitive learning is becoming a hot research area in pattern recognition. However, in cost-sensitive learning, examples costs are often difficult to achieve and usually decided by the authors experience. Hence, combining the cost-sensitive learning and matrixized learning thoughts, we propose a two-class cost-sensitive matrixized classification model based on information entropy called CsMatMHKS in this paper. The proposed CsMatMHKS introduces information entropy which can reveal the uncertainty of one sample into matrixized learning framework to decrease the total misclassification cost. The experimental results on the UCI datasets and image datasets indicate that the CsMatMHKS not only reduces the sum of classification costs but also has comparable classification accuracy.
CS-ResNet: Cost-sensitive residual convolutional neural network for PCB cosmetic defect detection
2021, Expert Systems with Applications
In the printed circuit board (PCB) industry, cosmetic defect detection is an essential process to ensure product quality. However, existing PCB cosmetic defect detection approaches have a high false alarm rate, which lead to expensive labor costs of manual confirmation. To solve this problem, some traditional machine learning-based approaches have been proposed, but they just utilize hand-crafted features to build classifiers and thus are rough and sub-optimal. Recently, due to its powerful capability in automatic feature extraction, convolutional neural network (CNN) has been widely used in PCB cosmetic defect detection. However, few of them pay attention to the imbalanced class distribution as well as the different misclassification costs of real and pseudo defects, both of which are common problems in the PCB industry. To this end, in this study, we propose a novel model called cost-sensitive residual convolutional neural network (CS-ResNet) by adding a cost-sensitive adjustment layer in the standard ResNet. Specifically, we assign larger weights to minority real defects based on the class-imbalance degree and then optimize CS-ResNet by minimizing the weighted cross-entropy loss function. We conducted a series of experiments by comparing CS-ResNet with the standard ResNet, state-of-the-art CNN-based approach Auto-VRS and traditional machine learning-based approach HOG-SVM on a real-world PCB cosmetic defect dataset. Experimental results show that CS-ResNet achieves the highest $S e n s i t i v i t y$ (0.89), $G$ - $m e a n$ (0.91) and the lowest misclassification costs.
Probabilistic personalised cascade with abstention
2021, Pattern Recognition Letters
Citation Excerpt :
Some recent publications proposed solutions to this problem discussing possible scenarios to estimate regions where a classifier is confident in its decision, i.e., they considered how to learn a rejector efficiently [2–4]. An interesting research direction is to explore feature redundancies to reduce the cost of learning and prediction [5], and to explore structure in cost sensitive learning, using, e.g., Bayesian networks [6]. We are motivated by the challenges coming from personalised medicine where it is crucial to find optimal — in terms of money, time, other budget constrains, and predictive accuracy — individual diagnostic protocols.
Cascade learning with abstention and individualised feature selection is a class of models in high demand in personalised medical applications. The cascade consists of sequential classifiers and rejectors, where the classifiers estimate confidence of prediction, and the rejectors evaluate an expected cost-to-go of features not selected yet. The number of models is exponential in the number of features and, therefore, the challenge is to find efficient heuristics for the NP-hard problem.
The state-of-the-art is based on complex deep neural networks. We introduce an efficient and robust approach based on a probabilistic graphical model representing a unified probabilistic classifier that can be applied at any stage of a multi-stage sequential model. As for the rejector, we build it on the probabilistic-based neural network that incorporates the very same probabilistic model to treat unobserved feature values.
We illustrate the efficiency of the proposed method on several data sets, and compare our results to the state-of-the-art.

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by G. Moser.

View full text

Cost-sensitive Bayesian network classifiers☆

Highlights

Abstract

Introduction

Section snippets

Related work

The instance weighting method in cost-sensitive C4.5

Data sets

Conclusions and future work

Acknowledgements

Pattern Recogn.

Pattern Recogn.

Smote: synthetic minority over-sampling technique

J. Artif. Intell. Res.

Approximating discrete probability distributions with dependence trees

IEEE Trans. Inf. Theory

Metacost: a general method for making classifiers cost-sensitive

The foundations of cost-sensitive learning

Inference-based naive bayes: turning nave bayes cost-sensitive

IEEE Trans. Knowl. Data Eng.

Bayesian network classifiers

Mach. Learn.

A novel bayes model: hidden naive bayes

IEEE Trans. Knowl. Data Eng.