Skip to main content
Log in

Feature selection for semi-supervised multi-target regression using genetic algorithm

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-target regression (MTR) is an exciting area of machine learning where the challenge is to predict the values of more than one target variables which can take on continuous values. These variables may or may not be correlated. Such problems commonly occur in real life scenarios, and therefore, interest and research in this area has increased in recent times. Some examples of applications include analyzing brain-activity data gathered using multimedia sensors, stock information from continuous web data, data related to characteristics of the vegetation at a certain site, etc. For a real-world multi-target learning system, the problem can be further complicated when new issues emerge with very little data available. In such cases, a semi-supervised approach can be adopted. This paper proposes a Genetic Algorithm (GA) based semi-supervised technique on multi-target regression problems to predict new targets, using very small number of labelled examples by incorporating GA with MTR-SAFER. Experiments are carried out on real world MTR data sets. The proposed method isexplored with different variations and also compared with the state of the art MTR methods. Results have indicated a significantly better performance with the further benefit of having a reduced feature set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://mulan.sourceforge.net/datasets-mtr.html

References

  1. Abd-Alsabour N (2014) A review on evolutionary feature selection. In: 2014 European modelling symposium. IEEE, pp 20–26

  2. Aguiar G, Santana E, Mastelini S, Mantovani R, Barbon S (2019) Towards meta-learning for multi-target regression problems

  3. Altman N, Krzywinski M (2018) The curse (s) of dimensionality. Nat Methods 15(6):399–400

    Article  Google Scholar 

  4. Aquino G, Rubio JDJ, Pacheco J, Gutierrez GJ, Ochoa G, Balcazar R, Cruz DR, Garcia E, Novoa JF, Zacarias A (2020) Novel nonlinear hypothesis for the delta parallel robot modeling. IEEE Access 8:46324–46334

    Article  Google Scholar 

  5. Babatunde OH, Armstrong L, Leng J, Diepeveen D (2014) A genetic algorithm-based feature selection

  6. Bandyopadhyay S, Saha S (2013) Some single-and multiobjective optimization techniques. In: Unsupervised classification. Springer, pp 17–58

  7. Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, Berlin

    MATH  Google Scholar 

  8. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 92–100

  9. Bodenhofer U (2003) Genetic algorithms: theory and applications

  10. Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis 143:106839

    Article  MathSciNet  MATH  Google Scholar 

  11. Borchani H, Varando G, Bielza C, Larranaga P (2015) A survey on multi-output regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):216–233

    Google Scholar 

  12. Bu H, Zheng S, Xia J (2009) Genetic algorithm based semi-feature selection method. In: 2009 international joint conference on bioinformatics, systems biology and intelligent computing. IEEE, pp 521–524

  13. Cernuda C, Lughofer E, Märzinger W, Summerer W (2013) Hybrid evolutionary particle swarm optimization and ant colony optimization for variable selection. Series 3rd World Conference on Information Technology (WCIT-2012) 3:7–14

    Google Scholar 

  14. Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Computers & Electrical Engineering 40(1):16–28

    Article  Google Scholar 

  15. Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle, O. et al., Eds.; 2006). IEEE Transactions on Neural Networks 20(3):542–542

    Article  Google Scholar 

  16. Chaudhry MU, Lee J-H (2018) Feature selection for high dimensional data using monte carlo tree search. IEEE Access 6:76036–76048

    Article  Google Scholar 

  17. Chen L (2009) Curse of dimensionality. In: Encyclopedia of database systems. Springer, pp 545–546

  18. Chen Y, Zhu X, Gong S (2018) Semi-supervised deep learning with memory. In: Proceedings of the European conference on computer vision (ECCV), pp 268–283

  19. Cherman EA, Monard MC, Metz J (2011) Multi-label problem transformation methods: a case study. CLEI Electronic Journal 14(1):4–4

    Article  Google Scholar 

  20. De Jesús Rubio J (2009) Sofmls: online self-organizing fuzzy modified least-squares network. Trans Fuz Sys 17(6):1296–1309

    Article  Google Scholar 

  21. De La Iglesia B (2013) Evolutionary computation for feature selection in classification problems. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3(6):381–407

    Google Scholar 

  22. Deb K (2005) Multi-objective optimization. In: Search methodologies. Springer, pp 273–316

  23. Dreyer S (2013) Evolutionary feature selection. Master’s thesis, Institutt for datateknikk og informasjonsvitenskap

  24. Eiben AE, Schoenauer M (2002) Evolutionary computing. Inf Process Lett 82(1):1–6

    Article  MathSciNet  MATH  Google Scholar 

  25. Elias I, Rubio JdJ, Martinez DI, Vargas TM, Garcia V, Mujica-Vargas D, Meda-Campaña JA, Pacheco J, Gutierrez GJ, Zacarias A (2020) Genetic algorithm with radial basis mapping network for the electricity consumption modeling. Appl Sci 10(12):4239

    Article  Google Scholar 

  26. Faris H, Ala’m A-Z, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2019) An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Information Fusion 48:67–83

    Article  Google Scholar 

  27. Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’m A-Z, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems 154:43–67

    Article  Google Scholar 

  28. Ghojogh B, Samad MN, Mashhadi SA, Kapoor T, Ali W, Karray F, Crowley M (2019) Feature selection and feature extraction in pattern analysis: a literature review. arXiv:1905.02845

  29. Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, Oxford

    Google Scholar 

  30. Han Z, Liu Y, Zhao J, Wang W (2012) Real time prediction for converter gas tank levels based on multi-output least square support vector regressor. Control Eng Pract 20(12):1400–1409

    Article  Google Scholar 

  31. Hatzikos EV, Tsoumakas G, Tzanis G, Bassiliades N, Vlahavas I (2008) An empirical study on sea water quality prediction. Knowl-Based Syst 21(6):471–478

    Article  Google Scholar 

  32. Hernández G, Zamora E, Sossa H, Téllez G, Furlán F (2020) Hybrid neural networks for big data classification. Neurocomputing 390:327–340

    Article  Google Scholar 

  33. Herrera F, Charte F, Rivera AJ, Del Jesus MJ (2016) Multilabel classification. In: Multilabel classification. Springer, pp 17–31

  34. Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection. In: Instance selection and construction for data mining. Springer, pp 95–112

  35. Jia X, Tian W, Li C, Yang X, Luo Z, Wang H (2020) A dynamic active safe semi-supervised learning framework for fault identification in labeled expensive chemical processes. Processes 8(1):105

    Article  Google Scholar 

  36. Jiang B, Wu X, Yu K, Chen H (2019) Joint semi-supervised feature selection and classification through bayesian approach. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 3983–3990

  37. Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1200–1205

  38. Karalič A, Bratko I (1997) First order regression. Mach Learn 26(2):147–176

    Article  MATH  Google Scholar 

  39. Kocev D, Džeroski S, White MD, Newell GR, Griffioen P (2009) Using single-and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol Model 220 (8):1159–1168

    Article  Google Scholar 

  40. Kostopoulos G, Karlos S, Kotsiantis S, Ragos O (2018) Semi-supervised regression: a recent review. Journal of Intelligent & Fuzzy Systems 35:1–18

    Article  Google Scholar 

  41. Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836

    Article  Google Scholar 

  42. Levatić J, Ceci M, Kocev D, Dz~eroski S (2014) Semi-supervised learning for multi-target regression. In: International workshop on new frontiers in mining complex patterns. Springer, pp 3–18

  43. Li H, Zhang W, Chen Y, Guo Y, Li G-Z, Zhu X (2017) A novel multi-target regression framework for time-series prediction of drug efficacy. Scientific Reports 7:40652

    Article  Google Scholar 

  44. Li Y-F, Zha H-W, Zhou Z-H (2017) Learning safe prediction for semi-supervised regression. In: AAAI, vol 2017, pp 2217–2223

  45. Lin Z, Ding G, Han J, Shao L (2018) End-to-end feature-aware label space encoding for multilabel classification with many classes. IEEE Transactions on Neural Networks and Learning Systems 29 (6):2472–2487

    Article  MathSciNet  Google Scholar 

  46. Meda-Campaña JA (2018) On the estimation and control of nonlinear systems with parametric uncertainties and noisy outputs. IEEE Access 6:31968–31973

    Article  Google Scholar 

  47. Miao J, Niu L (2016) A survey on feature selection. Procedia Computer Science 91:919–926

    Article  Google Scholar 

  48. Muhlenbein H, Mahnig T (2001) Mathematical analysis of evolutionary algorithms for optimization

  49. Petković M, Dz~eroski S, Kocev D (2017) Feature ranking for multi-target regression with tree ensemble methods. In: International conference on discovery science. Springer, pp 171–185

  50. Qin Y, Ding S, Wang L, Wang Y (2019) Research progress on semi-supervised clustering. Cognitive Computation 11:1–14

    Article  Google Scholar 

  51. Reyes O, Ventura S (2019) Performing multi-target regression via a parameter sharing-based deep network. International Journal of Neural Systems 29(09):1950014–1950014

    Article  Google Scholar 

  52. Roh Y, Heo G, Whang SE (2018) A survey on data collection for machine learning: a big data-ai integration perspective. arXiv:1811.03402

  53. Roy K, Bhattacharya P (2008) Improving features subset selection using genetic algorithms for iris recognition. In: IAPR workshop on artificial neural networks in pattern recognition. Springer, pp 292–304

  54. Samorani M, Wang Y, Lv Z, Glover F (2019) Clustering-driven evolutionary algorithms: an application of path relinking to the quadratic unconstrained binary optimization problem. J Heuristics 25(4-5):629–642

    Article  Google Scholar 

  55. Santana E, Augusto J, Silva P, Mastelini S, Barbon S (2019) Evaluation of multi-target regression to support decision on stock portfolio investment 12:1–23. https://sol.sbc.org.br/journals/index.php/isys/article/view/381

  56. Santana E, C Geronimo B, Mastelini S, H Carvalho R, Barbin D, Ida E, Barbon S (2018) Predicting poultry meat characteristics using an enhanced multi-target regression method. Biosyst Eng 171:193–204

    Article  Google Scholar 

  57. Sechidis K, Spyromitros-Xioufis E, Vlahavas I (2019) Information theoretic multi-target feature selection via output space quantization. Entropy 21(9):855

    Article  MathSciNet  Google Scholar 

  58. Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I (1211) Multi-label classification methods for multi-target regression. arXiv preprint arXiv

  59. Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I (2016) Multi-target regression via input space expansion: treating targets as inputs. Mach Learn 104(1):55–98

    Article  MathSciNet  MATH  Google Scholar 

  60. Syed, Tahir (2018) Safe semi supervised multi-target regression (mtr-safer) for new targets learning. Multimedia Tools and Applications 77:29971–29987

    Article  Google Scholar 

  61. Tahir MA, Bouridane A (2006) Novel round-robin tabu search algorithm for prostate cancer classification and diagnosis using multispectral imagery. IEEE Transactions on Information Technology in Biomedicine 10(4):782–793

    Article  Google Scholar 

  62. Tahir MA, Kittler J, Bouridane A (2012) Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recogn Lett 33(5):513–523

    Article  Google Scholar 

  63. Tahir MA, Smith J (2008) Feature selection using intensified tabu search for supervised classification

  64. Tahir MA, Smith J (2010) Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection. Pattern Recogn Lett 31(11):1470–1480

    Article  Google Scholar 

  65. Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, Fujita H (2019) An evolutionary gravitational search-based feature selection. Inf Sci 497:219–239

    Article  Google Scholar 

  66. Todorovski L, Blockeel H, Dzeroski S (2002) Ranking with predictive clustering trees

  67. Tsai C-F, Eberle W, Chu C-Y (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247

    Article  Google Scholar 

  68. Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statisticalmachine learning tools. Energy and Buildings 49(Supplement C):560–567

    Article  Google Scholar 

  69. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM) 3(3):1–13

    Article  Google Scholar 

  70. Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, pp 667–685

  71. Tsoumakas G, Spyromitros-Xioufis E, Vlahavas I (2014) Drawing parallels between multi-label classification and multi-target regression. In: ECML PKDD. Workshop on Multi-Target Prediction

  72. Tsoumakas G, Spyromitros-Xioufis E, Vrekou A, Vlahavas I (2014) Multi-target regression via random linear target combinations. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 225–240

  73. Vafaie H, De Jong K (1992) Genetic algorithms as a tool for feature selection in machine learning. In: Fourth international conference on tools with artificial intelligence, 1992. TAI’92, Proceedings. IEEE, pp 200–203

  74. Valente G, Castellanos AL, Vanacore G, Formisano E (2014) Multivariate linear regression of high-dimensional fmri data with multiple target variables. Human Brain Mapping 35(5):2163–2177

    Article  Google Scholar 

  75. Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109 (2):373–440

    Article  MathSciNet  MATH  Google Scholar 

  76. Wang J, Chen Z, Sun K, Li H, Deng X (2019) Multi-target regression via target specific features. Knowl-Based Syst 170:70–78

    Article  Google Scholar 

  77. Wasserman L, Lafferty JD (2008) Statistical analysis of semi-supervised regression. In: Advances in neural information processing systems, pp 801–808

  78. Xu D, Shi Y, Tsang IW, Ong Y-S, Gong C, Shen X (2019) A survey on multi-output learning. arXiv:1901.00248

  79. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626

    Article  Google Scholar 

  80. Yar MH, Rahmati V, Reza H, Oskouei D (2016) A survey on evolutionary computation: methods and their applications in engineering. Mod Appl Sci 10(11):131139

    Article  Google Scholar 

  81. Yeh I-C (2007) Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem Concr Compos 29(6):474–480

    Article  Google Scholar 

  82. Yuan H, Zheng J, Lai LL, Tang YY (2018) Sparse structural feature selection for multitarget regression. Knowl-Based Syst 160:200–209

    Article  Google Scholar 

  83. Zhaia X, Zhoua Z, Tina C (2020) Semi-supervised learning for ecg classification without patient-specific labeled data. Expert Systems with Applications 158:113411

    Article  Google Scholar 

  84. Zhen X, Yu M, He X, Li S (2018) Multi-target regression via robust low-rank learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(2):497–504

    Article  Google Scholar 

  85. Zhou Z-H, Li M (2005) Semi-supervised regression with co-training. In: IJCAI, vol 5, pp 908–913

Download references

Acknowledgements

This work was supported in part by the Higher Education Commission (HEC) Pakistan, and in part by the Ministry of Planning Development and Reforms under the National Center in Big Data and Cloud Computing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farrukh Hasan Syed.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Syed, F.H., Tahir, M.A., Rafi, M. et al. Feature selection for semi-supervised multi-target regression using genetic algorithm. Appl Intell 51, 8961–8984 (2021). https://doi.org/10.1007/s10489-021-02291-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02291-9

Keywords

Navigation