Evolutionary prototype selection for multi-output regression

doi:10.1016/j.neucom.2019.05.055

Neurocomputing

Volume 358, 17 September 2019, Pages 309-320

https://doi.org/10.1016/j.neucom.2019.05.055 Get rights and content

Under a Creative Commons license

open access

Highlights

•
A new prototype selection for multi-output regression data sets is presented.
•
A multi-objective evolutionary algorithm is used for prototype selection.
•
Multiple Pareto fronts are also used to prevent overfitting.
•
The new method improved the predictive capabilities and also greatly reduced data set size.

Abstract

A novel approach to prototype selection for multi-output regression data sets is presented. A multi-objective evolutionary algorithm is used to evaluate the selections using two criteria: training data set compression and prediction quality expressed in terms of root mean squared error. A multi-target regressor based on k-NN was used for that purpose during the training to evaluate the error, while the tests were performed using four different multi-target predictive models. The distance matrices used by the multi-target regressor were cached to accelerate operational performance. Multiple Pareto fronts were also used to prevent overfitting and to obtain a broader range of solutions, by using different probabilities in the initialization of populations and different evolutionary parameters in each one. The results obtained with the benchmark data sets showed that the proposed method greatly reduced data set size and, at the same time, improved the predictive capabilities of the multi-output regressors trained on the reduced data set.

Graphical abstract

Keywords

Prototype selection

Multi-output

Multi-target

Regression

Cited by (0)

Mirosław Kordos obtained M.Sc. in electrical engineering from Technical University of Lodz, Poland in 1994 and Ph.D. in computer science from Silesian University of Technology, Poland in 2005. In years 1994–2005 he was working in IT industry as software developer and systems engineer. In 2006–2007 he was a research fellow at the Division of Biomedical Informatics in Cincinnati Children’s Hospital Research Center, USA. In 2008–2009 he was an assistant professor at Silesian University of Technology, Poland and since 2010 he is an assistant professor at University of Bielsko-Biala, Poland. His research interest in recent years is mainly focused on instance selection in machine learning with classical methods, evolutionary methods and methods embedded into neural networks. He is an author or co-author of over 20 papers on instance selection.

Álvar Arnaiz González received his M.S. degree in Computer Engineering in 2010, and the Ph.D. in Computer Science in 2018 at the University of Burgos, Spain, for a thesis entitled “Study of instance selection methods”. He is currently an Assistant Professor at the Department of Civil Engineer of the Burgos University. His main research interests include Machine Learning, Data Mining, ensemble classifiers and instance selection. He is a member of the “Artificial Data Mining Research and Bioinformatics Learning” research group.

César García-Osorio was born in León, Spain. He received his B.S. and M.E. in Computer Engineering in 1996 at the University of Valladolid, Spain, and the PhD in Computer Science in 2005 at the University of the West of Scotland, United Kingdom, for a thesis entitled “Data mining and visualization”. He has been working at University of Burgos since 1996. His research interests include among others, ensemble learning, instance selection, data visualization. He is a member of the “Artificial Data Mining Research and Bioinformatics Learning” research group.