Abstract
We present a new evaluation criterion for the induction of decision trees. We exploit a parameter-free Bayesian approach and propose an analytic formula for the evaluation of the posterior probability of a decision tree given the data. We thus transform the training problem into an optimization problem in the space of decision tree models, and search for the best tree, which is the maximum a posteriori (MAP) one. The optimization is performed using top-down heuristics with pre-pruning and post-pruning processes. Extensive experiments on 30 UCI datasets and on the 5 WCCI 2006 performance prediction challenge datasets show that our method obtains predictive performance similar to that of alternative state-of-the-art methods, with far simpler trees.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blake, C., Merz, C.: UCI repository of machine learning databases (1996), http://www.ics.uci.edu/mlearn/MLRepository.html
Boullé, M.: A Bayes optimal approach for partitioning the values of categorical attributes. Journal of Machine Learning Research 6, 1431–1452 (2005)
Boullé, M.: MODL: a Bayes optimal discretization method for continuous attributes. Machine Learning 65(1), 131–165 (2006)
Boullé, M.: Compression-Based Averaging of Selective Naive Bayes Classifiers. Journal of Machine Learning Research 8, 1659–1685 (2007)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Chapman and Hall, New York (1984)
Breslow, L.A., Aha, D.W.: Simplifying decision trees: A survey. Knowl. Eng. Rev. 12(1), 1–40 (1997), http://dx.doi.org/10.1017/S0269888997000015
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley & sons, Chichester (1991)
Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Technical Report HPL-2003-4, HP Laboratories (2003)
Garner, S.R.: WEKA: The Waikato Environment for Knowledge Analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)
Guyon, I., Saffari, A., Dror, G., Bumann, J.: Performance Prediction Challenge. In: International Joint Conference on Neural Networks, pp. 2958–2965 (2006), http://www.modelselect.inf.ethz.ch/index.php
Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29(2), 119–127 (1980)
Kohavi, R., Quinlan, R.: Decision tree discovery. In: Handbook of Data Mining and Knowledge Discovery, pp. 267–276. University Press (2002)
Morgan, J., Sonquist, J.A.: Problems in the analysis of Survey data, And a proposal. Journal of the American Statistical Association 58, 415–435 (1963)
Murthy, S.K.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2, 345–389 (1998)
Naumov, G.E.: NP-completeness of problems of construction of optimal decision trees. Soviet Physics 34(4), 270–271 (1991)
Provost, F., Domingos, P.: Well-trained PETs: Improving Probability Estimation Trees. Technical Report CeDER #IS-00-04, New York University (2001)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–553 (1998)
Quinlan, J., Rivest, R.: Inferring decision trees using the minimum description length principle. Inf. Comput. 80(3), 227–248 (1989)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)
Wallace, C., Patrick, J.: Coding Decision Trees. Machine Learning 11, 7–22 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Voisine, N., Boullé, M., Hue, C. (2010). A Bayes Evaluation Criterion for Decision Trees. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-00580-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00579-4
Online ISBN: 978-3-642-00580-0
eBook Packages: EngineeringEngineering (R0)