Numerical Attributes in Decision Trees: A Hierarchical Approach

Berzal, Fernando; Cubero, Juan-Carlos; Marín, Nicolás; Sánchez, Daniel

doi:10.1007/978-3-540-45231-7_19

Fernando Berzal⁹,
Juan-Carlos Cubero⁹,
Nicolás Marín⁹ &
…
Daniel Sánchez⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2810))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1803 Accesses
1 Citations

Abstract

Decision trees are probably the most popular and commonly-used classification model. They are recursively built following a top-down approach (from general concepts to particular examples) by repeated splits of the training dataset. When this dataset contains numerical attributes, binary splits are usually performed by choosing the threshold value which minimizes the impurity measure used as splitting criterion (e.g. C4.5 gain ratio criterion or CART Gini’s index). In this paper we propose the use of multi-way splits for continuous attributes in order to reduce the tree complexity without decreasing classification accuracy. This can be done by intertwining a hierarchical clustering algorithm with the usual greedy decision tree learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berzal, F., Cubero, J.C., Sánchez, D., Serrano, J.M.: ART: A hybrid classification model. Machine Learning (2003) (to be published)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, California (1984) ISBN 0-534-98054-6
MATH Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Morgan Kaufmann, Los Altos (1995)
Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuousvalued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Google Scholar
Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.-Y.: BOAT - Optimistic Decision Tree Construction. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of Data, Philadelphia, PA, USA, May 31-June 3, pp. 169–180 (1999a)
Google Scholar
Gehrke, J., Loh, W.-Y., Ramakrishnan, R.: Classification and regression: money can grow on trees. In: Tutorial notes, KDD 1999, San Diego, California, USA, August 15-18, pp. 1–73 (1999b)
Google Scholar
Gehrke, J., Ramakrishnan, R., Ganti, V.: RainForest – A Framework for Fast Decision Tree Construction of Large Datasets. Data Mining and Knowledge Discovery 4(2/3), 127–162 (2000)
Article Google Scholar
Han, J., Kamber, M.: Data Mining. Concepts and Techniques. Morgan Kaufmann, USA (2001) ISBN 1-55860-489-8
Google Scholar
Ho, K.M., Scott, P.D.: Zeta: A global method for discretization of continuous variables. 3rd International Conferemce on Knowledge Discovery and Data Mining (KDD 1997), Newport Beach, pp. 191–194. AAAI Press, Menlo Park (1997)
Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Machine Learning 11, 63–90 (1993)
Article MATH Google Scholar
Hussain, F., Liu, H., Tan, C.L., Dash, M.: Discretization: An enabling technique. The National University of Singapore, School of Computing, TRC6/99 (June 1999)
Google Scholar
Loh, W.-Y., Shih, Y.-S.: Split Selection Methods for Classification Trees. Statistica Sinica 7, 815–840 (1997)
MathSciNet MATH Google Scholar
Lopez de Mantaras, R.: A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning 6, 81–92 (1991)
Google Scholar
Martin, J.K.: An Exact Probability Metric for Decision Tree Splitting and Stopping. Machine Learning 28, 257–291 (1997)
Article Google Scholar
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: A Fast Scalable Classifier for Data Mining. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996)
Chapter Google Scholar
Quinlan, J.R.: Induction on Decision Trees. Machine Learning 1, 81–106 (1986a)
Google Scholar
Quinlan, J.R.: Learning Decision Tree Classifiers. ACM Computing Surveys, 28(1), 71–72 (1986b)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993) ISBN 1-55860-238-0
Google Scholar
Quinlan, J.R.: Improved use of continuous attributes. C4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
MATH Google Scholar
Rasmussen, E.: Clustering algorithms. In: Frakes, W., Baeza-Yates, E. (eds.) Information Retrieval: Data Structures and Algorithms, ch. 16 (1992)
Google Scholar
Rastogi, R., Shim, K.: PUBLIC: A Decision Tree Classifier that integrates building and pruning. Data Mining and Knwoledge Discovery 4(4), 315–344 (2000)
Article MATH Google Scholar
Shafer, J.C., Agrawal, R., Mehta, M.: SPRINT: A Scalable Parallel Classifier for Data Mining. In: VLDB 1996, Mumbai (Bombay), India, September 3-6, pp. 544–555 (1996)
Google Scholar
Ullman, J.: Data Mining Lecture Notes. Stanford University CS345 Course on Data Mining, Spring (2000), http://www-db.stanford.edu/_Nullman/mining/mining.html
Van de Merckt, T.: Decision trees in numerical attribute spaces. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, pp. 1016–1021 (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

IDBIS Research Group – Dept. of Computer Science and A.I., E.T.S.I.I. – University of Granada, 18071, Granada, Spain
Fernando Berzal, Juan-Carlos Cubero, Nicolás Marín & Daniel Sánchez

Authors

Fernando Berzal
View author publications
You can also search for this author in PubMed Google Scholar
Juan-Carlos Cubero
View author publications
You can also search for this author in PubMed Google Scholar
Nicolás Marín
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Sánchez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Berkeley Initiative in Soft Computing (BISC), University of California at Berkeley, USA
Michael R. Berthold
Freie Universität Berlin, Garystr. 21, 14195, Berlin, Germany
Hans-Joachim Lenz
Department of Computer Science, University of Colorado, Boulder, Colorado, USA
Elizabeth Bradley
Otto-von-Guericke-University of Magdeburg, Germany
Rudolf Kruse
Department of Knowledge Processing and Language Engineering, University of Magdeburg, Universitätsplatz 2, 39106, Magdeburg, Germany
Christian Borgelt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berzal, F., Cubero, JC., Marín, N., Sánchez, D. (2003). Numerical Attributes in Decision Trees: A Hierarchical Approach. In: R. Berthold, M., Lenz, HJ., Bradley, E., Kruse, R., Borgelt, C. (eds) Advances in Intelligent Data Analysis V. IDA 2003. Lecture Notes in Computer Science, vol 2810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45231-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-45231-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40813-0
Online ISBN: 978-3-540-45231-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics