Abstract
It is generally recognised that recursive partitioning, as used in the construction of classification trees, is inherently unstable, particularly for small data sets. Classification accuracy and, by implication, tree structure, are sensitive to changes in the training data. Successful approaches to counteract this effect include multiple classifiers, e.g. boosting, bagging or windowing. The downside of these multiple classification models, however, is the plethora of trees that result, often making it difficult to extract the classifier in a meaningful manner. We show that, by using some very weak knowledge in the sampling stage, when the data set is partitioned into the training and test sets, a more consistent and improved performance is achieved by a single decision tree classifier.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Breiman, L.: Bias, variance, and arcing classifiers. Technical report 460, Department of Statistics, University of California at Berkeley (1996)
Breiman, L.: Bias, variance, and arcing classifiers. Machine Learning 26 (1998)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: International Conference on Machine Learning, pp. 148–156 (1996)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Shannon, W.: Averaging classification tree models. In: Interface 1998 Proceedings (1998)
Quinlan, J.R.: Miniboosting decision trees. Journal of Artificial Intelligence Research (1998)
Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trees. Journal of Artificial Intelligence Research 2, 1–32 (1994)
Svozil, D., Pospíchal, J., Kvasnicka, V.: Neural-network prediction of Carbon-13 NMR chemical shifts of alcanes. J. Chem. Inf. Comp. Sci. 35, 924–928 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gill, A.A., Smith, G.D., Bagnall, A.J. (2004). Improving Decision Tree Performance Through Induction- and Cluster-Based Stratified Sampling. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-28651-6_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22881-3
Online ISBN: 978-3-540-28651-6
eBook Packages: Springer Book Archive