Skip to main content
Log in

Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

In this paper we use genetic programming for changing the representation of the input data for machine learners. In particular, the topic of interest here is feature construction in the learning-from-examples paradigm, where new features are built based on the original set of attributes. The paper first introduces the general framework for GP-based feature construction. Then, an extended approach is proposed where the useful components of representation (features) are preserved during an evolutionary run, as opposed to the standard approach where valuable features are often lost during search. Finally, we present and discuss the results of an extensive computational experiment carried out on several reference data sets. The outcomes show that classifiers induced using the representation enriched by the GP-constructed features provide better accuracy of classification on the test set. In particular, the extended approach proposed in the paper proved to be able to outperform the standard approach on some benchmark problems on a statistically significant level.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. N. Bensusan, Automatic Bias Learning: An Inquiry into the Inductive Basis of Induction. Ph.D. Dissertation, School of Computing and Cognitive Sciences, University of Sussex, Sussex, 1998.

    Google Scholar 

  2. H. N. Bensusan and I. Kuscu, “Constructive induction using genetic programming,” in Proc. Int. Conf. Machine Learning, Evolutionary Computing and Machine Learning Workshop, T. Fogarty and G. Venturini (eds.), Bari, Italy, 1996.

  3. B. Bhanu and K. Krawiec, “Coevolutionary construction of features for transformation of representation in machine learning,” in Proceedings of the Bird of a Feather Workshops, Genetic and Evolutionary Computation Conference (GECCO 2002), A. M. Barry (ed.), AAAI Press: New York, 2002, pp. 249–254.

    Google Scholar 

  4. C. L. Blake and C. J. Merz, “UCI Repository of machine learning databases,” [http://www.ics.uci.edu/~mlearn/MLRepository.html], University of California: Irvine, CA, 1998.

    Google Scholar 

  5. M. Brameier and W. Banzhaf, “Evolving teams of predictors with linear genetic programming,” Genetic Programming and Evolvable Machines vol. 2, pp. 381–407, 2001.

    Google Scholar 

  6. C. E. Brodley and A. Pohoreckyj Danyluk (eds.), Proc. Int. Conf. Machine Learning, Morgan Kaufmann: San Francisco 2001.

    Google Scholar 

  7. M. Dash and H. Liu, “Feature selection for classification,” Intelligent Data Analysis vol. 1, no. 3, pp. 131–156, 1997.

    Google Scholar 

  8. K. A. De Jong, An Analysis of the Behavior of a Class of Genetic Adaptive Systems. Doctoral dissertation, University of Michigan, Ann Arbor, 1975.

    Google Scholar 

  9. K. A. De Jong, W. M. Spears, and D. F. Gordon, “Using genetic algorithms for concept learning,” Machine Learning vol. 13, pp. 161–188, 1993.

    Google Scholar 

  10. D. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley: Reading, 1989.

  11. J. J. Grefenstette, “Lamarckian learning in multi-agent environments,” in Proc. Fourth Intl. Conf. of Genetic Algorithms, Morgan Kaufmann, San Mateo, CA, 1991, pp. 303–310.

    Google Scholar 

  12. J. H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press: Ann Arbor, 1975.

    Google Scholar 

  13. I. Imam and H. Vafaie, “An empirical comparison between global and greedy-like search for feature selection,” in Proceedings of the Florida AI Research Symposium (FLAIRS-94), Pensacola Beach, FL, 1994, pp. 66–70.

  14. J. K. Kishore, L. M. Patnaik, V. Mani, and V. K. Agrawal, “Application of genetic programming for multicategory pattern classification,” IEEE Trans. Evolutionary Comp. vol. 4, no. 3, pp. 242–258, 2000.

    Google Scholar 

  15. M. Komosinski and K. Krawiec, “Evolutionary weighting of image features for diagnosing of CNS tumors,” Artif. Intell. in Medicine vol. 19, no. 1, pp. 25–38, 2000.

    Google Scholar 

  16. R. Kohavi and G. H. John, “Wrappers for feature subset selection,” Artif. Intell. Journal vol. 1–2, pp. 273–324, 1997.

    Google Scholar 

  17. R. Kohavi, D. Sommerfeld, and J. Dougherty, “Data mining using MLC++: Amachine learning library in C++,” in Proc. of the Eight International Conference on Tools with Artificial Intelligence (ICTAI'96), IEEE Computer Society, 1996, pp. 234–245.

  18. J. R. Koza, Genetic Programming—2, MIT Press: Cambridge, 1994.

    Google Scholar 

  19. K. Krawiec, “Pairwise comparison of hypotheses in evolutionary learning,” in Proc. Int. Conf. Machine Learning, C. E. Brodley and A. Pohoreckyj Danyluk (eds.), Morgan Kaufmann: San Francisco, 2001, pp. 266–273.

    Google Scholar 

  20. K. Krawiec, “Genetic programming with local improvement for visual learning from examples,” in Computer Analysis of Images and Patterns (LNCS 2124), W. Skarbek, (ed.), Springer: Berlin, 2001, pp. 209–216.

  21. K. Krawiec, “On the use of pairwise comparison of hypotheses in evolutionary learning applied to learning from visual examples,” in Machine Learning and Data Mining in Pattern Recognition (LNAI 2123), P. Perner (ed.), Springer: Berlin, 2001, pp. 307–321.

  22. P. Langley, Elements of Machine Learning, Morgan Kaufmann: San Francisco, 1996.

    Google Scholar 

  23. T.-S. Lim, W.-Y. Loh, and Y.-S. Shih, “Acomparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms,” Machine Learning, vol. 40, pp. 203–228, 2000.

    Google Scholar 

  24. S. Luke, “ECJ 7: An EC and GP system in Java,” http://www.cs.umd.edu/projects/plus/ec/ecj/, 2001.

  25. C. J. Matheus and L. A. Rendell, “Constructive induction on decision trees,” in Proc. of the Eleventh International Joint Conference on Artificial Intelligence, N. S. Sridharan (ed.), Detroit, MI, USA, August 1989, Morgan Kaufmann 1989, pp. 645–650.

  26. C. J. Matheus, “Aconstructive induction framework,” in Proc. of the Sixth International Workshop on Machine Learning, Ithaca: New York, 1989, pp. 474–475.

  27. G. Mayraz and G. Hinton, “Recognizing hand-written digits using hierarchical products of experts,” Adv. NIPS 13, MIT Press: Cambridge, MA, 2001, pp. 953–959.

    Google Scholar 

  28. R. S. Michalski, “Atheory and methodology of inductive learning,” Artificial Intelligence 20, pp. 111–161, 1983.

    Google Scholar 

  29. T. M. Mitchell, An Introduction to Genetic Algorithms, MIT Press: Cambridge, MA, 1996.

    Google Scholar 

  30. T. M. Mitchell, Machine Learning, McGraw-Hill: New York, 1997.

    Google Scholar 

  31. J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann: San Mateo, CA, 1992.

    Google Scholar 

  32. M. L. Raymer, W. F. Punch, E. D. Goodman, L. A. Kuhn, and A. K. Jain, “Dimensionality reduction using genetic algorithm,” IEEE Trans. on Evolutionary Computation, vol. 4, no. 2, pp. 164–171, 2000.

    Google Scholar 

  33. M. L. Raymer, W. F. Punch, E. D. Goodman, and L. A. Kuhn, “Genetic programming for improved data mining—application to the biochemistry of protein interactions,” in Genetic Programming 1996: Proceedings, J. R. Koza, D. E. Goldberg, D. B. Fogel, and R. L. Riolo, (eds.), MIT Press: Cambridge, MA, 1996, pp. 275–381.

    Google Scholar 

  34. S. Thrun, J. Bala, E. Bloedorn, I. Bratko, B. Cestnik, J. Cheng, K. De Jong, S. Dzeroski, R. Hamann, K. Kaufman, S. Keller, I. Kononenko, J. Kreuziger, R. S. Michalski, T. Mitchell, P. Pachowicz, B. Roger, H. Vafaie, W. Van de Velde, W. Wenzel, J. Wnek, and J. Zhang, “The MONK's problems: Aperformance comparison of different learning algorithms,” Technical Report CMU-CS-91-197, Carnegie Mellon University: Pittsburgh, PA, 1991.

    Google Scholar 

  35. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann: San Francisco, CA, 1999.

    Google Scholar 

  36. J. Yang and V. Honavar, “Feature subset selection using a genetic algorithm,” IEEE Intelligent Systems (Special Issue on Feature Transformation and Subset Selection), vol. 13, pp. 44–49, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Krawiec, K. Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks. Genetic Programming and Evolvable Machines 3, 329–343 (2002). https://doi.org/10.1023/A:1020984725014

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020984725014

Navigation