Skip to main content
Log in

Filter-based feature selection in the context of evolutionary neural networks in supervised machine learning

  • Theoretical advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

This paper presents a workbench to get simple neural classification models based on product evolutionary networks via a prior data preparation at attribute level by means of filter-based feature selection. Therefore, the computation to build the classifier is shorter, compared to a full model without data pre-processing, which is of utmost importance since the evolutionary neural models are stochastic and different classifiers with different seeds are required to get reliable results. Feature selection is one of the most common techniques for pre-processing the data within any kind of learning task. Six filters have been tested to assess the proposal. Fourteen (binary and multi-class) difficult classification data sets from the University of California repository at Irvine have been established as the test bed. An empirical study between the evolutionary neural network models obtained with and without feature selection has been included. The results have been contrasted with nonparametric statistical tests and show that the current proposal improves the test accuracy of the previous models significantly. Moreover, the current proposal is much more efficient than the previous methodology; the time reduction percentage is above 40%, on average. Our approach has also been compared with several classifiers both with and without feature selection in order to illustrate the performance of the different filters considered. Lastly, a statistical analysis for each feature selector has been performed providing a pairwise comparison between machine learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aha D, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66

    Google Scholar 

  2. Anderson TW (2003) An introduction to multivariate statistical analysis. Wiley, New York

    MATH  Google Scholar 

  3. Angeline PJ, Saunders GM, Pollack JB (1994) An evolutionary algorithm that construct recurrent neural networks. IEEE Trans Neural Netw 5(1):54–65

    Google Scholar 

  4. Bache K, Lichman M (2013) UCI machine learning repository. School of Information and Computer Science, University of California, Irvine

    Google Scholar 

  5. Battiti R, Tecchiolli G (1995) Training neural nets with the reactive tabu search. IEEE Trans Neural Netw 6(5):1185–1200

    Google Scholar 

  6. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York

    MATH  Google Scholar 

  7. Boese KD, Kahng AB (1993) Simulated annealing of neural networks: the cooling strategy reconsidered. In: Proceedings of the IEEE international symposium on circuits and systems (ISCAS 1993), vol 4. IEEE, Chicago, Illinois, USA, pp 2572–2575

  8. Bouckaert RR, Frank E, Hall MA, Holmes G, Pfahringer B, Reutemann P, Witten IH (2010) Weka—experiences with a java open-source project. J Mach Learn Res 11(1):2533–2541

    MATH  Google Scholar 

  9. Bridle JS (1990) Probabilistic Interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Fogelman Soulie F, Herault J (eds) Neurocomputing: algorithms, architectures and applications. Springer, Berlin, pp 227–236

    Google Scholar 

  10. Bryson AE, Yu-Chi H (1969) Applied optimal control: Optimization, estimation, and control. Blaisdell Publishing Company, Waltham

    Google Scholar 

  11. Caruana R, Freitag D (1994) Greedy attribute selection. In: Proceedings of the eleventh international conference on machine learning (ICML 1994). Morgan Kaufmann, New Brunswick, NJ, USA, pp 28–36

    Google Scholar 

  12. Cerný V (1985) Thermodynamical approach to the traveling salesman problem: an efficient simulation algorithm. J Optim Theory Appl 45(1):41–51

    MathSciNet  MATH  Google Scholar 

  13. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    MATH  Google Scholar 

  14. Cover T, Thomas J (1991) Elements of information theory. Wiley, New York

    MATH  Google Scholar 

  15. Curran D, O’Riordan C (2002) Applying evolutionary computation to designing neural networks: a study of the state of the art. Technical report NUIG-IT-111002, National University of Ireland, Galway, Department of Information Technology

  16. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

    Google Scholar 

  17. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1):155–176

    MathSciNet  MATH  Google Scholar 

  18. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  19. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    MathSciNet  MATH  Google Scholar 

  20. Durbin R, Rumelhart DE (1989) Product units: a computationally powerful and biologically plausible extension to backpropagation networks. Neural Comput 1(1):133–142

    Google Scholar 

  21. Embrechts MJ (2001) Computational intelligence for data mining. In: Proceedings of IEEE international conference on systems, man, and cybernetics (SMC 2001), vol 3. IEEE, Los Alamitos, pp 1484–1484

  22. Ferreira CBR, Borges DL (2003) Analysis of mammogram classification using a wavelet transform decomposition. Pattern Recognit Lett 24(7):973–982

    Google Scholar 

  23. Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Proceedings of the fifteenth international conference on machine learning (ICML 1998). Morgan Kaufmann, Madison, Wisconsin, USA, pp 144–151

  24. Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701

    MATH  Google Scholar 

  25. Fu KS, Min PJ, Li TJ (1970) Feature selection in pattern recognition. IEEE Trans Syst Sci Cybern 6(1):33–39

    MATH  Google Scholar 

  26. García-Pedrajas N, Hervás-Martínez C, Muñoz-Pérez J (2002) Multiobjetive cooperative coevolution of artificial neural networks. Neural Netw 15(10):1255–1274

    Google Scholar 

  27. Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random Forests for land cover classification. Pattern Recognit Lett 27(4):294–300

    Google Scholar 

  28. Glover F (1977) Heuristics for integer programming using surrogate constraints. Decis Sci 8(1):156–166

    Google Scholar 

  29. Glover F (1986) Future paths for integer programming and links to artificial intelligence. Comput Oper Res 13(5):533–549

    MathSciNet  MATH  Google Scholar 

  30. Gorunescu F, Belciug S, Gorunescu M, Badea R (2012) Intelligent decision-making for liver fibrosis stadialization based on tandem feature selection and evolutionary-driven neural network. Expert Syst Appl 39(17):12824–12832

    Google Scholar 

  31. Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. In: Proceedings of the 1997 international conference on neural information processing and intelligent information systems. Springer, New Zealand, pp 855–858

  32. Iman RL, Davenport JM (1980) Approximations of the critical region of the Friedman statistic. Commun Stat Theory Methods 9(6):571–595

    MATH  Google Scholar 

  33. Hervás-Martínez C, Martínez-Estudillo FJ, Gutiérrez PA (2006) Classification by means of evolutionary product-unit neural networks. In: Proceedings of the international joint conference on neural networks (IJCNN 2006). IEEE, Vancouver, BC, Canada, pp 2834–2842

  34. Jaeger H (2002) Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach. GMD report 159, German National Research Center for Information Technology

  35. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Google Scholar 

  36. Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: a tutorial. Computer 29(3):31–44

    Google Scholar 

  37. John GH, Kohavi R, Pfleger K (1994) Irrelevant feature and the subset selection problem. In: Proceedings of the eleventh international conference on machine learning (ICML 1994). Morgan Kaufmann, New Brunswick, NJ, USA, pp 121–129

    Google Scholar 

  38. Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680

    MathSciNet  MATH  Google Scholar 

  39. Krasnopolsky VM, Fox-Rabinovitz MS (2006) Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw 19:122–134

    Google Scholar 

  40. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence (IJCAI 1995), vol 2. Morgan Kaufmann, Montréal, Québec, Canada, pp 1137–1145

  41. Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324

    MATH  Google Scholar 

  42. Koller D, Sahami M (1996) Toward optimal feature selection. In: Proceedings of the thirteenth international conference on machine learning (ICML 1996). Morgan Kaufmann, Bari, Italy, pp 284–292

  43. Kuncheva LI, del Rio Vilas VJ, Rodríguez JJ (2007) Diagnosing scrapie in sheep: a classification experiment. Comput Biol Med 37(8):1194–1202

    Google Scholar 

  44. Kwak N, Choi CH (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159

    Google Scholar 

  45. Larson J, Newman F (2011) An implementation of scatter search to train neural networks for brain lesion recognition. Involve J Math 4(3):203–211

    MathSciNet  MATH  Google Scholar 

  46. Liu H, Motoda H (2008) Computational methods of feature selection. Chapman & Hall/CRC, Boca Raton

    MATH  Google Scholar 

  47. Liu H, Setiono R (1998) Some issues on scalable feature selection. Expert Syst Appl 15(3–4):333–339

    Google Scholar 

  48. Luukka P (2011) Feature selection using fuzzy entropy measures with similarity classifier. Expert Syst Appl 38(4):4600–4607

    Google Scholar 

  49. Martínez-Estudillo FJ, Hervás-Martínez C, Gutiérrez-Peña PA, Martínez-Estudillo AC, Ventura-Soto S (2006) Evolutionary product-unit neural networks for classification. In: Proceedings of the seventh international conference on intelligent data engineering and automated learning (IDEAL 2006). Springer, Burgos, Spain, pp 1320–1328

    Google Scholar 

  50. Miller GF, Todd PM, Hegde SU (1989) Designing neural networks using genetic algorithms. In: Proceedings of the 3rd international conference on genetic algorithms (ICGA 1989). Morgan Kaufmann, George Mason University, Fairfax, Virginia, USA, pp 379–384

  51. Milne L (1995) Feature selection using neural networks with contribution measures. In: Proceedings of the eighth Australian joint conference on artificial intelligence (AI 95). Canberra, Australia, pp 215–221

  52. Murty MN, Devi VS (2011) Pattern recognition: An algorithmic approach. Springer, New York

    MATH  Google Scholar 

  53. Nemenyi PB (1963) Distribution-free multiple comparisons. PhD, Princeton University

  54. Ohkura K, Yasuda T, Kawamatsu Y, Matsumura Y, Ueda K (2007) MBEANN: mutation-based evolving artificial neural networks. In: Advances in artificial life, proceedings of the 9th European conference (ECAL 2007). Springer, Lisbon, Portugal, pp 936–945

  55. Parker DB (1985) Learning logic. Technical report TR-47, MIT Center for Research in Computational Economics and Management Science, Cambridge, MA

  56. Prechelt L (1994) Proben1—a set of neural network benchmark problems and benchmarking rules. Technical report 21/94, Fakultat für Informatik, Univ. Karlsruhe, Karlsruhe, Germany

  57. Quinlan J (1993) C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco

    Google Scholar 

  58. Rechenberg I (1989) Evolution strategy: Nature’s way of optimization. In: Bergmann HW (ed) Optimization: Methods and applications, possibilities and limitations. Springer, Bonn, pp 106–126

    Google Scholar 

  59. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2003) Fast feature ranking algorithm. In: Proceedings of the seventh international conference on knowledge-based intelligent information and engineering systems (KES 2003). Springer, Oxford, UK, pp 325–331

    Google Scholar 

  60. Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12):2383–2392

    Google Scholar 

  61. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, the PDP Research Group (eds) Parallel distributed processing: explorations in the microstructure of cognition (volume 1: foundations). MIT Press, Cambridge, MA, pp 318–362

  62. Schaffer JD, Whitley D, Eshelman LJ (1992) Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: Proceedings of the international workshop on combinations of genetic algorithms and neural networks (COGANN 1992). IEEE Society Press, Los Alamitos, CA, pp 1–37

  63. Sethi IK, Jain AK (2014) Artificial neural networks and statistical pattern recognition: Old and new connections. Machine intelligence and pattern recognition series, vol 11. Elsevier, Amsterdam

  64. Sexton R, Dorsey R, Johnson J (1999) Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing. Eur J Oper Res 114(3):589–601

    MATH  Google Scholar 

  65. Tallón-Ballesteros AJ, Gutiérrez-Peña PA, Hervás-Martínez C (2007) Distribution of the search of evolutionary product unit neural networks for classification. In: Proceedings of the IADIS international conference on applied computing (AC 2007). IADIS, Salamanca, Spain, pp 266–273

  66. Tallón-Ballesteros AJ, Hervás-Martínez C (2011) A two-stage algorithm in evolutionary product unit neural networks for classification. Expert Syst Appl 38(1):743–754

    Google Scholar 

  67. Tallón-Ballesteros AJ, Hervás-Martínez C, Riquelme JC, Ruiz R (2013) Feature selection to enhance a two-stage evolutionary algorithm in product unit neural networks for complex classification problems. Neurocomputing 114:107–117

    Google Scholar 

  68. Towell GG, Shavlik JW (1994) Knowledge-based artificial neural networks. Artif Intell 70(1–2):119–165

    MATH  Google Scholar 

  69. Vapnik VN (1995) The nature of statistical learning theory. Springer, Heidelberg

    MATH  Google Scholar 

  70. Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioural sciences. PhD thesis, Harvard University, Boston

  71. Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of the international conference on machine learning (ICML 2001). Morgan Kaufmann, San Francisco, CA, pp 601–608

  72. Yao X, Liu Y (1997) A new evolutionary system for evolving artificial neural networks. IEEE Trans Neural Netw 8(3):694–713

    Google Scholar 

  73. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    MathSciNet  MATH  Google Scholar 

  74. Zhen S, Jianlin C, Di T, Zhou YCT (2004) Comparison of steady state and elitist selection genetic algorithms. In: Proceedings of international conference on intelligent mechatronics and automation (ICMA 2004). IEEE, pp 495–499

Download references

Acknowledgements

This work has been partially subsidised by TIN2011-28956-C02-02 and TIN2014-55894-C2-R projects of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT) and FEDER funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tallón-Ballesteros, A.J., Riquelme, J.C. & Ruiz, R. Filter-based feature selection in the context of evolutionary neural networks in supervised machine learning. Pattern Anal Applic 23, 467–491 (2020). https://doi.org/10.1007/s10044-019-00798-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-019-00798-z

Keywords

Navigation