Cost-sensitive Bayesian network classifiers☆
Introduction
Classification is one of the most important tasks in data mining and machine learning. Traditional data mining and machine learning algorithms [19] are designed to yield classifiers that minimize the number of misclassification errors. In this case, the costs for different misclassification errors are considered to be equal. However, in real-world domains, many practical classification problems have different costs associated with different types of error. For example, in medical diagnosis, the cost of misclassifying a cancer patient to be a healthy person is significantly greater than the opposite type of error. So it is important to create a classifier that minimizes the total misclassification costs rather than the number of misclassification errors. This kind of classification task is called cost-sensitive classification [8], [16], [31]. In recent years, cost-sensitive classification has received increased attention.
In existing studies, most of the works are devoted to make decision trees cost-sensitive. A detailed survey of cost-sensitive decision tree induction algorithms can be found in Lomax and Vadera’s paper [18]. However, a comprehensive study of cost-sensitive Bayesian network classifiers is rare in existing studies. Only a few works make naive Bayesian network classifiers cost-sensitive. For example, Gama [12] presents a cost-sensitive iterative Bayes. For another example, Chai et al. [1] specifically consider test-cost sensitive learning and propose a test-cost sensitive naive Bayes. For the third example, Fang [9] develops a cost-sensitive naive Bayes method which learns and infers the order relation from the training data and classifies the instance based on the inferred order relation.
In this paper, we focus our attention on cost-sensitive Bayesian network classifiers. In existing cost-sensitive studies, some meta-learning methods, such as MetaCost [6], instance weighting [23], thresolding [21], and sampling [13] etc., can be applied to make Bayesian network classifiers cost-sensitive. Among these, instance weighting is a simple, easy to understand and efficient method. Inspired by the success of cost-sensitive C4.5 [23] and weighted random forest [3], in this paper, the instance weighting method is incorporated into various Bayesian network classifiers, such as naive Bayes (NB), tree augmented Bayesian networks (TAN) [10], averaged one dependence estimators (AODE) [25], and hidden naive Bayes (HNB) [14], to make these Bayesian network classifiers cost-sensitive. The resulting classifiers are called cost-sensitive Bayesian network classifiers in this paper. The experimental results on a large number of UCI data sets show that these cost-sensitive Bayesian network classifiers achieve a substantial reduction in the total misclassification costs and the number of high cost errors.
The rest of this paper is organized as follows. In Section 2, some works related to cost-sensitive learning are introduced. In Section 3, we revisit several state-of-the-art Bayesian network classifiers and then incorporate the instance weighting method into them. In Section 4, we conduct a series of experiments on 36 UCI benchmark data sets to validate our proposed cost-sensitive Bayesian network classifiers. In Section 5, we draw conclusions and outline the main directions for our future work.
Section snippets
Related work
Cost-sensitive learning is generally categorized into two categories [16]. One is the direct method, which is to directly introduce and utilize misclassification costs into the learning algorithms. As a result, the learning algorithms are cost-sensitive in themselves. Some direct cost-sensitive learning algorithms include ICET [24], cost-sensitive iterative naive Bayes [12], and cost-sensitive decision trees [7], [17].
The other category is called cost-sensitive meta-learning method.
The instance weighting method in cost-sensitive C4.5
The central choice in the decision tree induction is selecting which attribute to split the training data at each non-terminal node in the tree. The information gain measure and its variants are generally used to select attributes. No matter the information gain measure or its variants are based on a measure called entropy [19]. Given a node t of a decision tree, let be the number of instances in node t and be the number of class j instances in node t, and the entropy of node t is
Data sets
We ran our experiments on the whole 36 UCI data sets published on the main web site of Weka platform [27], which represent a wide range of domains and data characteristics. The original description of these 36 data sets can be found from our previous papers [14], [15]. In our experiments, the following 4 processing steps are adopted.
- 1.
Replacing missing attribute values: The unsupervised filter named ReplaceMissingValues in Weka is used to replace all missing attribute values in each data set.
- 2.
Conclusions and future work
In real-world applications, cost-sensitive learning has received increased attention. Traditional Bayesian network classifiers are designed to minimize the misclassification errors. When they are applied to cost-sensitive learning tasks, the performance is generally poor. In order to scale up their performance, we incorporate the instance weighing method into various Bayesian network classifiers and propose cost-sensitive Bayesian network classifiers in this paper. The instance weighting method
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions. This work was partially supported by the National Natural Science Foundation of China (61203287), the Program for New Century Excellent Talents in University (NCET-12-0953), the Provincial Natural Science Foundation of Hubei (2011CDA103), and the Fundamental Research Funds for the Central Universities (CUG130504, CUG130414).
References (31)
- et al.
Eusboost: enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling
Pattern Recogn.
(2013) - et al.
Cost-sensitive boosting for classification of imbalanced data
Pattern Recogn.
(2007) - X. Chai, L. Deng, Q. Yang, C.X. Ling, Test-cost sensitive naive bayes classification, in: Fourth IEEE International...
- et al.
Smote: synthetic minority over-sampling technique
J. Artif. Intell. Res.
(2002) - C. Chen, A. Liaw, L. Breiman, Using random forest to learn imbalanced data, University of California Berkeley,...
- D.M. Chickering, Learning bayesian networks is np-complete, in: Learning from Data, 1996, pp....
- et al.
Approximating discrete probability distributions with dependence trees
IEEE Trans. Inf. Theory
(1968) Metacost: a general method for making classifiers cost-sensitive
- C. Drummond, R. Holte, Exploiting the cost (in)sensitivity of decision tree splitting criteria, in: Proceedings of the...
The foundations of cost-sensitive learning
Inference-based naive bayes: turning nave bayes cost-sensitive
IEEE Trans. Knowl. Data Eng.
Bayesian network classifiers
Mach. Learn.
A novel bayes model: hidden naive bayes
IEEE Trans. Knowl. Data Eng.
Cited by (61)
Extended natural neighborhood for SMOTE and its variants in imbalanced classification
2023, Engineering Applications of Artificial IntelligenceMaximizing AUC to learn weighted naive Bayes for imbalanced data classification
2023, Expert Systems with ApplicationsAn ensemble oversampling method for imbalanced classification with prior knowledge via generative adversarial network
2023, Chemometrics and Intelligent Laboratory SystemsCost-sensitive matrixized classification learning with information entropy
2022, Applied Soft ComputingCS-ResNet: Cost-sensitive residual convolutional neural network for PCB cosmetic defect detection
2021, Expert Systems with ApplicationsProbabilistic personalised cascade with abstention
2021, Pattern Recognition LettersCitation Excerpt :Some recent publications proposed solutions to this problem discussing possible scenarios to estimate regions where a classifier is confident in its decision, i.e., they considered how to learn a rejector efficiently [2–4]. An interesting research direction is to explore feature redundancies to reduce the cost of learning and prediction [5], and to explore structure in cost sensitive learning, using, e.g., Bayesian networks [6]. We are motivated by the challenges coming from personalised medicine where it is crucial to find optimal — in terms of money, time, other budget constrains, and predictive accuracy — individual diagnostic protocols.
- ☆
This paper has been recommended for acceptance by G. Moser.