Skip to main content

Generating AVTs Using GA for Learning Decision Tree Classifiers with Missing Data

  • Conference paper
Discovery Science (DS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

Abstract

Attribute value taxonomies (AVTs) have been used to perform AVT-guided decision tree learning on partially or totally missing data. In many cases, user-supplied AVTs are used. We propose an approach to automatically generate an AVT for a given dataset using a genetic algorithm. Experiments on real world datasets demonstrate the feasibility of our approach, generating AVTs which yield comparable performance (in terms of classification accuracy) to that with user supplied AVTs.

This research was supported in part by grants from the National Science Foundation (grant 021969) and the National Institutes of Health (GM066387) to Vasant Honavar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kang, D.K., Silvescu, A., Zhang, J., Honavar, V.: Generation of attribute value taxonomies from data and their use in data driven construction of accurate and compact naive bayes classification. In: Proceedings of the ECML/PKDD Workshop on Knowledge Discovery and Ontologies (2004)

    Google Scholar 

  2. Zhang, J., Honavar, V.: Learning decision tree classifiers from attribute value taxonomies and partially specified data. In: Proceedings of the Twentieth International Conference on Machine Learning, ICML 2003 (2003)

    Google Scholar 

  3. Quinlan, R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  4. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1992)

    Google Scholar 

  5. Mitchell, M.: An Introduction to Genetic algorithms. MIT Press, Cambridge (1996)

    Google Scholar 

  6. Goldberg, D.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, New York (1989)

    MATH  Google Scholar 

  7. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intelligent Systems 13, 44–49 (1998)

    Article  Google Scholar 

  8. Taylor, M., Stoffel, K., Hendler, J.: Ontology-based induction of high level classification rules. In: SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (1997)

    Google Scholar 

  9. Cimiano, P., Staab, S., Tane, J.: Automated acquisition of taxonomies from text: Fca meets nlp. In: Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, Cavtat-Dubrovnik, Croatia, pp. 10–17 (2003)

    Google Scholar 

  10. Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)

    Google Scholar 

  11. Gibson, D., Kleinberg, J., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. VLDB Journal:Very Large Data Bases 8, 222–236 (2000)

    Article  Google Scholar 

  12. Ganti, V., Gehrke, J., Ramakrishnan, R.: Cactus - clustering categorical data using summaries. In: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 73–83. ACM press, New York (1999)

    Chapter  Google Scholar 

  13. Pereira, F., Tishby, N., Lee, L.: Distributional clustering of english words. In: 31st Annual Meeting of the ACL, pp. 183–190 (1993)

    Google Scholar 

  14. Yamazaki, T., Pazzani, M., Merz, C.: Learning hierarchies from ambiguous natural language data. In: International Conference on Machine Learning, pp. 575–583 (1995)

    Google Scholar 

  15. Pazzani, M., Kibler, D.: The role of prior knowledge in inductive learning. Machine Learning 9, 54–97 (1992)

    Google Scholar 

  16. Slonim, N., Tishby, N.: Agglomerative information bottleneck. In: NIPS-12 (1999)

    Google Scholar 

  17. Slonim, N., Tishby, N.: Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 208–215. ACM press, New York (2000)

    Chapter  Google Scholar 

  18. Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of the 21rd annual international ACM SIGIR conference on Research and Development in Information Retrieval, pp. 96–103. ACM press, New York (1998)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joo, J., Zhang, J., Yang, J., Honavar, V. (2004). Generating AVTs Using GA for Learning Decision Tree Classifiers with Missing Data. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30214-8_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23357-2

  • Online ISBN: 978-3-540-30214-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics