doi:10.1016/j.datak.2007.02.003
Copyright © 2007 Elsevier B.V. All rights reserved.
Towards efficient variables ordering for Bayesian networks classifier
aBayesMining Lab, Computer Science Dept., Universidade Fedearl de Sao Carlos DC/UFSCar, Brazil
bNTT Lab, COPPE/Universidade Federal do Rio de Janeiro, Brazil
Received 2 September 2006;
revised 21 December 2006;
accepted 1 February 2007.
Available online 6 March 2007.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
Traditionally, the task of learning Bayesian Networks (BNs) from data has been treated as a NP-Hard search problem. To overcome such difficulty in terms of computational complexity, several approximations have been designed, such as imposing a previous ordering on the domain attributes that restrict the number of Bayesian structures to be learned or using other approaches trying to reduce the state space of this problem. In this paper, we propose a simple method based on feature ranking algorithms which has low computational complexity (O(n2), where n is the number of variables) and produces good results. We empirically demonstrate that feature ranking algorithms (namely, Chi-Squared and Information Gain) can be used to define efficient variables ordering in the BNC learning context. The proposed method can bring improvements, when using the K2 algorithm, to learn a Bayesian Network Classifier from data.
Keywords: Bayesian networks classifiers; Supervised learning; Variable ordering; Feature ranking
Fig. 1. Feature Ranking Algorithm in a nutshell.
Fig. 2. Feature Ranking Bayesian Network Classifier (FRaBayCla) learning algorithm in a nutshell.
Fig. 3. Algorithmic description of the Class 1 datasets simulations.
Fig. 4. Algorithmic description of the Class 2 datasets simulations.
Table 1.
Datasets overview

Table 2.
Average correct classification rates (ACCR) using Class 1 datasets

Table 3.
Class 2 datasets simulations results

95% CI: 95% confidence interval on the %ACCR mean; 99% CI: 99% confidence interval on the %ACCR mean; ACCR K2χ2: ACCRs using the Chi-Squared ranked_list; ACCR K2IG: ACCRs using the Information Gain ranked_list; ACCR Best: ACCRs using the best ordering from the 35 randomly generated.
Table 4.
Class 2 datasets simulations results using classical classifiers

ACCR K2χ2: ACCRs using the Chi-Squared ranked_list; ACCR K2IG: ACCRs using the Information Gain ranked_list; ACCR Best: ACCRs using the best ordering from the 35 randomly generated; ACCR Tabu: ACCRs using the Tabu Classifier; ACCR Hill: ACCRs using the Hill-B Classifier;. ACCR TAN: ACCRs using the TAN Classifier; ACCR J48: ACCRs using the J48 Classifier; ACCR Naive: ACCRs using the Naïve-Bayes Classifier; ACCR IB1: ACCRs using the IB1 Classifier; ACCR CI: ACCRs using the Conditional Independence Classifier; Average: the average ACCRs obtained using all the above classifiers.