Abstract
Attribute value taxonomies (AVTs) have been used to perform AVT-guided decision tree learning on partially or totally missing data. In many cases, user-supplied AVTs are used. We propose an approach to automatically generate an AVT for a given dataset using a genetic algorithm. Experiments on real world datasets demonstrate the feasibility of our approach, generating AVTs which yield comparable performance (in terms of classification accuracy) to that with user supplied AVTs.
This research was supported in part by grants from the National Science Foundation (grant 021969) and the National Institutes of Health (GM066387) to Vasant Honavar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kang, D.K., Silvescu, A., Zhang, J., Honavar, V.: Generation of attribute value taxonomies from data and their use in data driven construction of accurate and compact naive bayes classification. In: Proceedings of the ECML/PKDD Workshop on Knowledge Discovery and Ontologies (2004)
Zhang, J., Honavar, V.: Learning decision tree classifiers from attribute value taxonomies and partially specified data. In: Proceedings of the Twentieth International Conference on Machine Learning, ICML 2003 (2003)
Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)
Mitchell, M.: An Introduction to Genetic algorithms. MIT Press, Cambridge (1996)
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York (1989)
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems 13, 44–49 (1998)
Taylor, M., Stoffel, K., Hendler, J.: Ontology-based induction of high level classification rules. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (1997)
Cimiano, P., Staab, S., Tane, J.: Automated acquisition of taxonomies from text: Fca meets nlp. In: Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia, pp. 10–17 (2003)
Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. VLDB Journal:Very Large Data Bases 8, 222–236 (2000)
Ganti, V., Gehrke, J., Ramakrishnan, R.: Cactus - clustering categorical data using summaries. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 73–83. ACM press, New York (1999)
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of english words. In: 31st Annual Meeting of the ACL, pp. 183–190 (1993)
Yamazaki, T., Pazzani, M., Merz, C.: Learning hierarchies from ambiguous natural language data. In: International Conference on Machine Learning, pp. 575–583 (1995)
Pazzani, M., Kibler, D.: The role of prior knowledge in inductive learning. Machine Learning 9, 54–97 (1992)
Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: NIPS-12 (1999)
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 208–215. ACM press, New York (2000)
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 96–103. ACM press, New York (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joo, J., Zhang, J., Yang, J., Honavar, V. (2004). Generating AVTs Using GA for Learning Decision Tree Classifiers with Missing Data. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-30214-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive