Generating AVTs Using GA for Learning Decision Tree Classifiers with Missing Data

Joo, Jinu; Zhang, Jun; Yang, Jihoon; Honavar, Vasant

doi:10.1007/978-3-540-30214-8_30

Jinu Joo²⁰,
Jun Zhang²¹,
Jihoon Yang²⁰ &
…
Vasant Honavar²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

International Conference on Discovery Science

876 Accesses
2 Citations

Abstract

Attribute value taxonomies (AVTs) have been used to perform AVT-guided decision tree learning on partially or totally missing data. In many cases, user-supplied AVTs are used. We propose an approach to automatically generate an AVT for a given dataset using a genetic algorithm. Experiments on real world datasets demonstrate the feasibility of our approach, generating AVTs which yield comparable performance (in terms of classification accuracy) to that with user supplied AVTs.

This research was supported in part by grants from the National Science Foundation (grant 021969) and the National Institutes of Health (GM066387) to Vasant Honavar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kang, D.K., Silvescu, A., Zhang, J., Honavar, V.: Generation of attribute value taxonomies from data and their use in data driven construction of accurate and compact naive bayes classification. In: Proceedings of the ECML/PKDD Workshop on Knowledge Discovery and Ontologies (2004)
Google Scholar
Zhang, J., Honavar, V.: Learning decision tree classifiers from attribute value taxonomies and partially specified data. In: Proceedings of the Twentieth International Conference on Machine Learning, ICML 2003 (2003)
Google Scholar
Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)
Google Scholar
Mitchell, M.: An Introduction to Genetic algorithms. MIT Press, Cambridge (1996)
Google Scholar
Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York (1989)
MATH Google Scholar
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems 13, 44–49 (1998)
Article Google Scholar
Taylor, M., Stoffel, K., Hendler, J.: Ontology-based induction of high level classification rules. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (1997)
Google Scholar
Cimiano, P., Staab, S., Tane, J.: Automated acquisition of taxonomies from text: Fca meets nlp. In: Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia, pp. 10–17 (2003)
Google Scholar
Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)
Google Scholar
Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. VLDB Journal:Very Large Data Bases 8, 222–236 (2000)
Article Google Scholar
Ganti, V., Gehrke, J., Ramakrishnan, R.: Cactus - clustering categorical data using summaries. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 73–83. ACM press, New York (1999)
Chapter Google Scholar
Pereira, F., Tishby, N., Lee, L.: Distributional clustering of english words. In: 31st Annual Meeting of the ACL, pp. 183–190 (1993)
Google Scholar
Yamazaki, T., Pazzani, M., Merz, C.: Learning hierarchies from ambiguous natural language data. In: International Conference on Machine Learning, pp. 575–583 (1995)
Google Scholar
Pazzani, M., Kibler, D.: The role of prior knowledge in inductive learning. Machine Learning 9, 54–97 (1992)
Google Scholar
Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: NIPS-12 (1999)
Google Scholar
Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 208–215. ACM press, New York (2000)
Chapter Google Scholar
Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 96–103. ACM press, New York (1998)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Sogang University, 1 Shinsoo-Dong, Mapo-Ku, Seoul, 121-742, Korea
Jinu Joo & Jihoon Yang
Artificial Intelligence Research Laboratory, Department of Computer Science, Iowa State University, Ames, IA, 50011, USA
Jun Zhang & Vasant Honavar

Authors

Jinu Joo
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jihoon Yang
View author publications
You can also search for this author in PubMed Google Scholar
Vasant Honavar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, 744 Motooka, Nishi, 819-0395, Fukuoka, Japan
Einoshin Suzuki
Kyushu University, 6–10–1 Hakozaki Higashi-ku, 812–8581, Fukuoka, Japan
Setsuo Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joo, J., Zhang, J., Yang, J., Honavar, V. (2004). Generating AVTs Using GA for Learning Decision Tree Classifiers with Missing Data. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_30

Download citation

DOI: https://doi.org/10.1007/978-3-540-30214-8_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics