Abstract
Multi-target regression (MTR) is an exciting area of machine learning where the challenge is to predict the values of more than one target variables which can take on continuous values. These variables may or may not be correlated. Such problems commonly occur in real life scenarios, and therefore, interest and research in this area has increased in recent times. Some examples of applications include analyzing brain-activity data gathered using multimedia sensors, stock information from continuous web data, data related to characteristics of the vegetation at a certain site, etc. For a real-world multi-target learning system, the problem can be further complicated when new issues emerge with very little data available. In such cases, a semi-supervised approach can be adopted. This paper proposes a Genetic Algorithm (GA) based semi-supervised technique on multi-target regression problems to predict new targets, using very small number of labelled examples by incorporating GA with MTR-SAFER. Experiments are carried out on real world MTR data sets. The proposed method isexplored with different variations and also compared with the state of the art MTR methods. Results have indicated a significantly better performance with the further benefit of having a reduced feature set.
Similar content being viewed by others
References
Abd-Alsabour N (2014) A review on evolutionary feature selection. In: 2014 European modelling symposium. IEEE, pp 20–26
Aguiar G, Santana E, Mastelini S, Mantovani R, Barbon S (2019) Towards meta-learning for multi-target regression problems
Altman N, Krzywinski M (2018) The curse (s) of dimensionality. Nat Methods 15(6):399–400
Aquino G, Rubio JDJ, Pacheco J, Gutierrez GJ, Ochoa G, Balcazar R, Cruz DR, Garcia E, Novoa JF, Zacarias A (2020) Novel nonlinear hypothesis for the delta parallel robot modeling. IEEE Access 8:46324–46334
Babatunde OH, Armstrong L, Leng J, Diepeveen D (2014) A genetic algorithm-based feature selection
Bandyopadhyay S, Saha S (2013) Some single-and multiobjective optimization techniques. In: Unsupervised classification. Springer, pp 17–58
Bezdek JC (2013) Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, Berlin
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory. ACM, pp 92–100
Bodenhofer U (2003) Genetic algorithms: theory and applications
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M (2020) Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics & Data Analysis 143:106839
Borchani H, Varando G, Bielza C, Larranaga P (2015) A survey on multi-output regression. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(5):216–233
Bu H, Zheng S, Xia J (2009) Genetic algorithm based semi-feature selection method. In: 2009 international joint conference on bioinformatics, systems biology and intelligent computing. IEEE, pp 521–524
Cernuda C, Lughofer E, Märzinger W, Summerer W (2013) Hybrid evolutionary particle swarm optimization and ant colony optimization for variable selection. Series 3rd World Conference on Information Technology (WCIT-2012) 3:7–14
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Computers & Electrical Engineering 40(1):16–28
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (Chapelle, O. et al., Eds.; 2006). IEEE Transactions on Neural Networks 20(3):542–542
Chaudhry MU, Lee J-H (2018) Feature selection for high dimensional data using monte carlo tree search. IEEE Access 6:76036–76048
Chen L (2009) Curse of dimensionality. In: Encyclopedia of database systems. Springer, pp 545–546
Chen Y, Zhu X, Gong S (2018) Semi-supervised deep learning with memory. In: Proceedings of the European conference on computer vision (ECCV), pp 268–283
Cherman EA, Monard MC, Metz J (2011) Multi-label problem transformation methods: a case study. CLEI Electronic Journal 14(1):4–4
De Jesús Rubio J (2009) Sofmls: online self-organizing fuzzy modified least-squares network. Trans Fuz Sys 17(6):1296–1309
De La Iglesia B (2013) Evolutionary computation for feature selection in classification problems. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 3(6):381–407
Deb K (2005) Multi-objective optimization. In: Search methodologies. Springer, pp 273–316
Dreyer S (2013) Evolutionary feature selection. Master’s thesis, Institutt for datateknikk og informasjonsvitenskap
Eiben AE, Schoenauer M (2002) Evolutionary computing. Inf Process Lett 82(1):1–6
Elias I, Rubio JdJ, Martinez DI, Vargas TM, Garcia V, Mujica-Vargas D, Meda-Campaña JA, Pacheco J, Gutierrez GJ, Zacarias A (2020) Genetic algorithm with radial basis mapping network for the electricity consumption modeling. Appl Sci 10(12):4239
Faris H, Ala’m A-Z, Heidari AA, Aljarah I, Mafarja M, Hassonah MA, Fujita H (2019) An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Information Fusion 48:67–83
Faris H, Mafarja MM, Heidari AA, Aljarah I, Ala’m A-Z, Mirjalili S, Fujita H (2018) An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowledge-Based Systems 154:43–67
Ghojogh B, Samad MN, Mashhadi SA, Kapoor T, Ali W, Karray F, Crowley M (2019) Feature selection and feature extraction in pattern analysis: a literature review. arXiv:1905.02845
Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, Oxford
Han Z, Liu Y, Zhao J, Wang W (2012) Real time prediction for converter gas tank levels based on multi-output least square support vector regressor. Control Eng Pract 20(12):1400–1409
Hatzikos EV, Tsoumakas G, Tzanis G, Bassiliades N, Vlahavas I (2008) An empirical study on sea water quality prediction. Knowl-Based Syst 21(6):471–478
Hernández G, Zamora E, Sossa H, Téllez G, Furlán F (2020) Hybrid neural networks for big data classification. Neurocomputing 390:327–340
Herrera F, Charte F, Rivera AJ, Del Jesus MJ (2016) Multilabel classification. In: Multilabel classification. Springer, pp 17–31
Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection. In: Instance selection and construction for data mining. Springer, pp 95–112
Jia X, Tian W, Li C, Yang X, Luo Z, Wang H (2020) A dynamic active safe semi-supervised learning framework for fault identification in labeled expensive chemical processes. Processes 8(1):105
Jiang B, Wu X, Yu K, Chen H (2019) Joint semi-supervised feature selection and classification through bayesian approach. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 3983–3990
Jović A, Brkić K, Bogunović N (2015) A review of feature selection methods with applications. In: 2015 38th international convention on information and communication technology, electronics and microelectronics (MIPRO). IEEE, pp 1200–1205
Karalič A, Bratko I (1997) First order regression. Mach Learn 26(2):147–176
Kocev D, Džeroski S, White MD, Newell GR, Griffioen P (2009) Using single-and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol Model 220 (8):1159–1168
Kostopoulos G, Karlos S, Kotsiantis S, Ragos O (2018) Semi-supervised regression: a recent review. Journal of Intelligent & Fuzzy Systems 35:1–18
Kou G, Yang P, Peng Y, Xiao F, Chen Y, Alsaadi FE (2020) Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods. Appl Soft Comput 86:105836
Levatić J, Ceci M, Kocev D, Dz~eroski S (2014) Semi-supervised learning for multi-target regression. In: International workshop on new frontiers in mining complex patterns. Springer, pp 3–18
Li H, Zhang W, Chen Y, Guo Y, Li G-Z, Zhu X (2017) A novel multi-target regression framework for time-series prediction of drug efficacy. Scientific Reports 7:40652
Li Y-F, Zha H-W, Zhou Z-H (2017) Learning safe prediction for semi-supervised regression. In: AAAI, vol 2017, pp 2217–2223
Lin Z, Ding G, Han J, Shao L (2018) End-to-end feature-aware label space encoding for multilabel classification with many classes. IEEE Transactions on Neural Networks and Learning Systems 29 (6):2472–2487
Meda-Campaña JA (2018) On the estimation and control of nonlinear systems with parametric uncertainties and noisy outputs. IEEE Access 6:31968–31973
Miao J, Niu L (2016) A survey on feature selection. Procedia Computer Science 91:919–926
Muhlenbein H, Mahnig T (2001) Mathematical analysis of evolutionary algorithms for optimization
Petković M, Dz~eroski S, Kocev D (2017) Feature ranking for multi-target regression with tree ensemble methods. In: International conference on discovery science. Springer, pp 171–185
Qin Y, Ding S, Wang L, Wang Y (2019) Research progress on semi-supervised clustering. Cognitive Computation 11:1–14
Reyes O, Ventura S (2019) Performing multi-target regression via a parameter sharing-based deep network. International Journal of Neural Systems 29(09):1950014–1950014
Roh Y, Heo G, Whang SE (2018) A survey on data collection for machine learning: a big data-ai integration perspective. arXiv:1811.03402
Roy K, Bhattacharya P (2008) Improving features subset selection using genetic algorithms for iris recognition. In: IAPR workshop on artificial neural networks in pattern recognition. Springer, pp 292–304
Samorani M, Wang Y, Lv Z, Glover F (2019) Clustering-driven evolutionary algorithms: an application of path relinking to the quadratic unconstrained binary optimization problem. J Heuristics 25(4-5):629–642
Santana E, Augusto J, Silva P, Mastelini S, Barbon S (2019) Evaluation of multi-target regression to support decision on stock portfolio investment 12:1–23. https://sol.sbc.org.br/journals/index.php/isys/article/view/381
Santana E, C Geronimo B, Mastelini S, H Carvalho R, Barbin D, Ida E, Barbon S (2018) Predicting poultry meat characteristics using an enhanced multi-target regression method. Biosyst Eng 171:193–204
Sechidis K, Spyromitros-Xioufis E, Vlahavas I (2019) Information theoretic multi-target feature selection via output space quantization. Entropy 21(9):855
Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I (1211) Multi-label classification methods for multi-target regression. arXiv preprint arXiv
Spyromitros-Xioufis E, Tsoumakas G, Groves W, Vlahavas I (2016) Multi-target regression via input space expansion: treating targets as inputs. Mach Learn 104(1):55–98
Syed, Tahir (2018) Safe semi supervised multi-target regression (mtr-safer) for new targets learning. Multimedia Tools and Applications 77:29971–29987
Tahir MA, Bouridane A (2006) Novel round-robin tabu search algorithm for prostate cancer classification and diagnosis using multispectral imagery. IEEE Transactions on Information Technology in Biomedicine 10(4):782–793
Tahir MA, Kittler J, Bouridane A (2012) Multilabel classification using heterogeneous ensemble of multi-label classifiers. Pattern Recogn Lett 33(5):513–523
Tahir MA, Smith J (2008) Feature selection using intensified tabu search for supervised classification
Tahir MA, Smith J (2010) Creating diverse nearest-neighbour ensembles using simultaneous metaheuristic feature selection. Pattern Recogn Lett 31(11):1470–1480
Taradeh M, Mafarja M, Heidari AA, Faris H, Aljarah I, Mirjalili S, Fujita H (2019) An evolutionary gravitational search-based feature selection. Inf Sci 497:219–239
Todorovski L, Blockeel H, Dzeroski S (2002) Ranking with predictive clustering trees
Tsai C-F, Eberle W, Chu C-Y (2013) Genetic algorithms in feature and instance selection. Knowl-Based Syst 39:240–247
Tsanas A, Xifara A (2012) Accurate quantitative estimation of energy performance of residential buildings using statisticalmachine learning tools. Energy and Buildings 49(Supplement C):560–567
Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. International Journal of Data Warehousing and Mining (IJDWM) 3(3):1–13
Tsoumakas G, Katakis I, Vlahavas I (2009) Mining multi-label data. In: Data mining and knowledge discovery handbook. Springer, pp 667–685
Tsoumakas G, Spyromitros-Xioufis E, Vlahavas I (2014) Drawing parallels between multi-label classification and multi-target regression. In: ECML PKDD. Workshop on Multi-Target Prediction
Tsoumakas G, Spyromitros-Xioufis E, Vrekou A, Vlahavas I (2014) Multi-target regression via random linear target combinations. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 225–240
Vafaie H, De Jong K (1992) Genetic algorithms as a tool for feature selection in machine learning. In: Fourth international conference on tools with artificial intelligence, 1992. TAI’92, Proceedings. IEEE, pp 200–203
Valente G, Castellanos AL, Vanacore G, Formisano E (2014) Multivariate linear regression of high-dimensional fmri data with multiple target variables. Human Brain Mapping 35(5):2163–2177
Van Engelen JE, Hoos HH (2020) A survey on semi-supervised learning. Mach Learn 109 (2):373–440
Wang J, Chen Z, Sun K, Li H, Deng X (2019) Multi-target regression via target specific features. Knowl-Based Syst 170:70–78
Wasserman L, Lafferty JD (2008) Statistical analysis of semi-supervised regression. In: Advances in neural information processing systems, pp 801–808
Xu D, Shi Y, Tsang IW, Ong Y-S, Gong C, Shen X (2019) A survey on multi-output learning. arXiv:1901.00248
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
Yar MH, Rahmati V, Reza H, Oskouei D (2016) A survey on evolutionary computation: methods and their applications in engineering. Mod Appl Sci 10(11):131139
Yeh I-C (2007) Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cem Concr Compos 29(6):474–480
Yuan H, Zheng J, Lai LL, Tang YY (2018) Sparse structural feature selection for multitarget regression. Knowl-Based Syst 160:200–209
Zhaia X, Zhoua Z, Tina C (2020) Semi-supervised learning for ecg classification without patient-specific labeled data. Expert Systems with Applications 158:113411
Zhen X, Yu M, He X, Li S (2018) Multi-target regression via robust low-rank learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(2):497–504
Zhou Z-H, Li M (2005) Semi-supervised regression with co-training. In: IJCAI, vol 5, pp 908–913
Acknowledgements
This work was supported in part by the Higher Education Commission (HEC) Pakistan, and in part by the Ministry of Planning Development and Reforms under the National Center in Big Data and Cloud Computing.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Syed, F.H., Tahir, M.A., Rafi, M. et al. Feature selection for semi-supervised multi-target regression using genetic algorithm. Appl Intell 51, 8961–8984 (2021). https://doi.org/10.1007/s10489-021-02291-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02291-9