Bayesian network classifiers using ensembles and smoothing

Zhang, He; Petitjean, François; Buntine, Wray

doi:10.1007/s10115-020-01458-z

Bayesian network classifiers using ensembles and smoothing

Regular Paper
Published: 30 March 2020

Volume 62, pages 3457–3480, (2020)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

475 Accesses
10 Citations
Explore all metrics

Abstract

Bayesian network classifiers are, functionally, an interesting class of models, because they can be learnt out-of-core, i.e. without needing to hold the whole training data in main memory. The selective K-dependence Bayesian network classifier (SKDB) is state of the art in this class of models and has shown to rival random forest (RF) on problems with categorical data. In this paper, we introduce an ensembling technique for SKDB, called ensemble of SKDB (ESKDB). We show that ESKDB significantly outperforms RF on categorical and numerical data, as well as rivalling XGBoost. ESKDB combines three main components: (1) an effective strategy to vary the networks that is built by single classifiers (to make it an ensemble), (2) a stochastic discretization method which allows to both tackle numerical data as well as further increases the variance between different components of our ensemble and (3) a superior smoothing technique to ensure proper calibration of ESKDB’s probabilities. We conduct a large set of experiments with 72 datasets to study the properties of ESKDB (through a sensitivity analysis) and show its competitiveness with the state of the art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Article 22 May 2018

François Petitjean, Wray Buntine, … Nayyar Zaidi

Stochastic optimization for bayesian network classifiers

Article 16 March 2022

Yi Ren, LiMin Wang, … JunYang Wei

Randomized Bayesian Network Classifiers

Notes

The more common representation \(\mathrm{Dir}(\alpha _1,\ldots , \alpha _C)\) is not used here.
https://github.com/icesky0125/ESKDB-on-numerical-data.

References

Bostrom H (2007) Estimating class probabilities in random forests. In: Machine learning and applications, 2007. ICMLA 2007. 6th international conference on, IEEE, pp 211–216
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Buntine W (1991) Theory refinement of Bayesian networks. In: 7th conference on uncertainty in artificial intelligence, Anaheim, CA
Buntine W (1993) Learning classification trees. Artificial intelligence frontiers in statistics. Springer, Berlin, pp 182–201
Google Scholar
Buntine W, Mishra S (2014) Experiments with non-parametric topic models. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 881–890
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SigKDD international conference on knowledge discovery and data mining, ACM, pp 785–794
Chipman HA, George EI, McCulloch RE (1998) Bayesian CART model search. J Am Stat Assoc 93(443):935–948
Article Google Scholar
Chow C, Liu C (1968) Approximating discrete probability distributions with dependence trees. IEEE Trans Inf Theory 14(3):462–467
Article MathSciNet MATH Google Scholar
Dash D, Cooper GF (2004) Model averaging for prediction with discrete Bayesian networks. J Mach Learn Res 5:1177–1203
MathSciNet MATH Google Scholar
Du L (2011) Non-parametric Bayesian methods for structured topic models. Ph.D. thesis, Australian National University
Duan Z, Wang L (2017) \(K\)-dependence Bayesian classifier ensemble. Entropy 19(12):651
MathSciNet Google Scholar
Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1027
Freund Y, Schapire RE (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In: European conference on computational learning theory. Springer, pp 23–37
Friedman J, Hastie T, Tibshirani R et al (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Article MATH Google Scholar
Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163
Article MATH Google Scholar
Hearst MA (1998) Support vector machines. IEEE Intell Syst 13(4):18–28
Article Google Scholar
Hoeting JA, Madigan D, Raftery AE, Volinsky CT (1999) Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors). Stat Sci 14(4):382–417
Article MATH Google Scholar
Koivisto M, Sood K (2004) Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res 5:549–573
MathSciNet MATH Google Scholar
Lewis DD (1998) Naive Bayes at forty: the independence assumption in information retrieval. Springer, Berlin, pp 4–15
Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Madigan D, York J, Allard D (1995) Bayesian graphical models for discrete data. Int Stat Rev 63(2):215–232
Article MATH Google Scholar
Martínez AM, Webb GI, Chen S, Zaidi NA (2016) Scalable learning of Bayesian network classifiers. J Mach Learn Res 17(1):1515–1549
Petitjean F, Buntine W, Webb GI, Zaidi N (2018) Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes. Mach Learn 107(8):1303–1331
Article MathSciNet MATH Google Scholar
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215
Article MATH Google Scholar
Sahami M (1996) Learning limited dependence Bayesian classifiers. KDD 96:335–338
Google Scholar
Shareghi E, Haffari G, Cohn T (2017) Compressed nonparametric language modelling. In: Proceedings of the 26th international joint conference on artificial intelligence, pp 2701–2707
Teh YW, Jordan MI (2010) Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametr 1:158–207
Article MathSciNet Google Scholar
Tian J, He R, Ram L (2010) Bayesian model averaging using the \(k\)-best Bayesian network structures. In: Proceedings of the 26th conference on uncertainty in artificial intelligence, AUAI Press, UAI’10, pp 589–597
Webb GI, Boughton JR, Wang Z (2005) Not so naive Bayes: aggregating one-dependence estimators. Mach Learn 58(1):5–24
Article MATH Google Scholar
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. ICML Citeseer 1:609–616
Google Scholar
Zhou ZH (2012) Ensemble methods: foundations and algorithms, 1st edn. CRC Press

Download references

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Melbourne, Australia
He Zhang, François Petitjean & Wray Buntine

Authors

He Zhang
View author publications
You can also search for this author in PubMed Google Scholar
François Petitjean
View author publications
You can also search for this author in PubMed Google Scholar
Wray Buntine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to He Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was partially supported by the China Scholarship Council under Awards 201506300081 and the Australian Government through the Australian Research Council’s Discovery Projects funding scheme (Projects DP190100017 and DE170100037).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H., Petitjean, F. & Buntine, W. Bayesian network classifiers using ensembles and smoothing. Knowl Inf Syst 62, 3457–3480 (2020). https://doi.org/10.1007/s10115-020-01458-z

Download citation

Received: 23 July 2019
Revised: 18 February 2020
Accepted: 22 February 2020
Published: 30 March 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10115-020-01458-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bayesian network classifiers using ensembles and smoothing

Abstract

Access this article

Similar content being viewed by others

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Stochastic optimization for bayesian network classifiers

Randomized Bayesian Network Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bayesian network classifiers using ensembles and smoothing

Abstract

Access this article

Similar content being viewed by others

Accurate parameter estimation for Bayesian network classifiers using hierarchical Dirichlet processes

Stochastic optimization for bayesian network classifiers

Randomized Bayesian Network Classifiers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation