Skip to main content

Advertisement

Log in

Genetic Programming with a Genetic Algorithm for Feature Construction and Selection

  • Published:
Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Abstract

The use of machine learning techniques to automatically analyse data for information is becoming increasingly widespread. In this paper we primarily examine the use of Genetic Programming and a Genetic Algorithm to pre-process data before it is classified using the C4.5 decision tree learning algorithm. Genetic Programming is used to construct new features from those available in the data, a potentially significant process for data mining since it gives consideration to hidden relationships between features. A Genetic Algorithm is used to determine which such features are the most predictive. Using ten well-known datasets we show that our approach, in comparison to C4.5 alone, provides marked improvement in a number of cases. We then examine its use with other well-known machine learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. D. Aha and D. Kibler, “Instance-based learning algorithms,” Machine Learning vol. 6, pp. 37–66, 1991.

    Google Scholar 

  2. M. Ahluwalia and L. Bull, “Co-evolving functions in genetic programming: Classification using k-nearest neighbour,” in GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, W. Banzhaf, J. Daida, G. Eiben, M-H. Garzon, J. Honavar, K. Jakeila, and R. Smith (Eds.), Morgan Kaufmann: San Mateo, 1999, pp. 947–952.

  3. Y. Amit and D. Geman, “Shape quantization and recognition with randomized trees,” Neural Computation, vol. 9, no. 7, pp. 1545–1588, 1996.

    Google Scholar 

  4. W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic programming—An Introduction on the Automatic Evolution of Computer Programs and its Applications, Morgan Kaufmann: San Mateo, 1998.

    Google Scholar 

  5. L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.

    MATH  MathSciNet  Google Scholar 

  6. I. Dagher, M. Georgiopoulos, G. L. Heileman, and G. Bebis, “An ordering algorithm for pattern presentation in fuzzy ARTMAP that tends to improve generalization performance,” IEEE Transactions on Neural Networks vol. 10, no. 4, pp. 768–778, 1999.

    Article  Google Scholar 

  7. P. Dixon, D. Corne, and M. Oates, “A preliminary investigation of modified XCS as a generic data mining Tool,” in Advances in Learning Classifier Systems, P-L. Lanzi, W. Stolzmann, and S. Wilson (Eds.), Springer, 2001, pp. 133–151.

  8. A. Ekárt and A. Márkus, “Using genetic programming and decision trees for generating structural descriptions of four bar mechanisms,” Artificial Intelligence for Engineering Design, Analysis and Manufacturing, vol. 17, no. 3. 2003, to appear.

  9. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.

    Article  Google Scholar 

  10. J. Holland, Adaptation in Natural and Artificial Systems. Univ. Michigan, 1975.

  11. G. John and P. Langley, “Estimating continuous distributions in bayesian classifiers,” in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann: San Mateo, 1995, pp. 338–345.

  12. J. Kelly and L. Davis, “Hybridizing the genetic algorithm and the K nearest neighbors classification algorithm”, in Proceedings of the Fourth International Conference on Genetic Algorithms, R. Belew and L. Booker (Eds.), Morgan Kaufmann: San Mateo, 1991, pp. 377–383.

  13. R. Kohavi and G. John, “Wrappers for feature subset selection,” Artificial Intelligence Journal, vols. 1–2, pp. 273–324, 1997.

    Google Scholar 

  14. J. Koza, Genetic Programming, MIT Press, 1992.

  15. K. Krawiec, “Genetic programming-based construction of features for machine learning and knowledge discovery tasks,” Genetic Programming and Evolvable Machines, vol. 3, no. 4, pp. 329–343, 2002.

    Article  MATH  Google Scholar 

  16. O. Mangasarian and D. Musicant, “Lagrangian support vector machines,” Journal of Machine Learning Research vol. 1, pp. 161–177, 2001.

    Article  MathSciNet  Google Scholar 

  17. T. M. Mitchell, Machine Learning. McGraw-Hill: New York, 1997.

    Google Scholar 

  18. F. Otero, M. Silva, A. Freitas, and J. Nievola, “Genetic programming for attribute construction in data mining,” in Proceedings of Genetic Programming: 6th European Conference, EuroGP 2003, Essex, UK, Springer, 2003, pp. 384–393.

  19. J. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann: San Mateo, 1993.

    Google Scholar 

  20. M. Raymer, W. Punch, E. Goodman, and L. Kuhn, “Genetic programming for improved data mining—Application to the biochemistry of protein interactions,” in Proceedings of the Second Annual Conference on Genetic Programming, J. Koza, K. Deb, M. Dorigo, D. Fogel, M.Garzon, H. Iba, and R. Riolo (Eds.), Morgan Kaufmann: San Mateo, 1996, pp. 375–380.

  21. W. Siedlecki and J. Sklansky, “On automatic feature selection,” International Journal of Pattern Recognition and Artificial Intelligence vol. 2, pp. 197–220, 1988.

    Article  Google Scholar 

  22. D. Song, M. I. Heywood, and A. Nur Zincir-Heywood, “A linear genetic programming approach to intrusion detection,” Genetic and Evolutionary Computation—GECCO-2003, E. Cantú-Paz et al. (Eds.), 2003, pp. 2325–2336.

  23. H. Vafaie and K. De Jong, “Genetic algorithms as a tool for restructuring feature space representations,” in Proceedings of the International Conference on Tools with A.I., IEEE Computer Society Press, 1995.

  24. I. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann: San Mateo, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew G. Smith.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smith, M.G., Bull, L. Genetic Programming with a Genetic Algorithm for Feature Construction and Selection. Genet Program Evolvable Mach 6, 265–281 (2005). https://doi.org/10.1007/s10710-005-2988-7

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10710-005-2988-7

Keywords

Navigation