Abstract
Feature selection has been widely used for decades as a preprocessing step that allows for reducing the dimensionality of a problem while improving classification accuracy. The need for this kind of technique has increased dramatically in recent years with the advent of Big Data. This data explosion not only has the problem of a large number of samples, but also of big dimensionality. This chapter will analyze the paramount need for feature selection and briefly review the most popular feature selection methods and some typical applications. Moreover, as the new Big Data scenario offers new opportunities to machine learning researchers, we will discuss the new challenges that need to be faced: from the scalability of the methods to the role of feature selection in the presence of deep learning, as well as exploring its use in embedded devices. Beyond a shadow of doubt, the explosion in the number of features and computing technologies will point to a number of hot spots for feature selection researchers to launch new lines of research.
Part of the content of this chapter was previously published in Knowledge-Based Systems (https://doi.org/10.1016/j.knosys.2015.05.014, https://doi.org/10.1016/j.knosys.2020.105885, https://doi.org/10.1016/j.knosys.2019.105326), Knowledge and Information Systems (https://doi.org/10.1007/s10115-012-0487-8), and Information Fusion (https://doi.org/10.1016/j.inffus.2018.11.008).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
H. Climente-González, C. Azencott, S. Kaski, M. Yamada, Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data. Bioinformatics 35(14), i427–i435 (2019)
N. Grgic-Hlaca, M.B. Zafar, K.P. Gummadi, A. Weller, Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. AAAI 18, 51–60 (2018)
I. Furxhi, F. Murphy, M. Mullins, A. Arvanitis, C.A. Poland, Nanotoxicology data for in silico tools: a literature review. Nanotoxicology 1–26 (2020)
Y. Zhai, Y. Ong, I.W. Tsang, The emerging “big dimensionality’’. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)
M. Tan, I.W. Tsang, L. Wang, Towards ultrahigh dimensional feature selection for big data. J. Mach. Learn. Res. 15, 1371–1429 (2014)
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, J. Attenberg, Feature hashing for large scale multitask learning, in Proceedings of the 26th Annual International Conference on Machine Learning (2009), pp. 1113–1120
D.L. Donoho et al., High-dimensional data analysis: the curses and blessings of dimensionality, in AMS Math Challenges Lecture (2000), pp. 1–32
R. Bellman, Dynamic Programming (Princeton UP, Princeton, NJ, 1957), p. 18
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
I. Guyon, Feature Extraction: Foundations and Applications, vol. 207 (Springer, Berlin, 2006)
B. Bonev, Feature Selection Based on Information Theory (Universidad de Alicante, 2010)
G. Hughes, On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)
A.J. Miller, Selection of subsets of regression variables. J. R. Stat. Society. Ser. (Gen.) 389–425 (1984)
A.L. Blum, P. Langley, Selection of relevant features and examples in machine learning. Artif. Intell. 97(1), 245–271 (1997)
M. Dash, H. Liu, Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
R. Kohavi, G.H. John, Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
H. Liu, H. Motoda, Computational Methods of Feature Selection (CRC Press, 2007)
Z.A. Zhao, H. Liu, Spectral Feature Selection for Data Mining (Chapman & Hall/CRC, 2011)
C. Boutsidis, P. Drineas, M.W. Mahoney, Unsupervised feature selection for the k-means clustering problem, in Advances in Neural Information Processing Systems (2009), pp. 153–161
V. Roth, T. Lange, Feature selection in clustering problems, in Advances in Neural Information Processing Systems (2003)
R. Leardi, A. Lupiáñez González, Genetic algorithms applied to feature selection in pls regression: how and when to use them. Chemom. Intell. Lab. Syst. 41(2), 195–207 (1998)
D. Paul, E. Bair, T. Hastie, R. Tibshirani, “Preconditioning” for feature selection and regression in high-dimensional problems. Ann. Stat. 1595–1618 (2008)
M. Pal, G.M. Foody, Feature selection for classification of hyperspectral data by SVM. IEEE Trans. Geosci. Remote Sens. 48(5), 2297–2307 (2010)
L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 1205–1224 (2004)
M.A. Hall, Correlation-Based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand (1999)
M. Dash, H. Liu, Consistency-based search in feature selection. J. Artif. Intell. 151(1–2), 155–176 (2003)
A.M. Hall, L.A. Smith, Practical feature subset selection for machine learning. J. Comput. Sci. 98, 4–6 (1998)
L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in Proceedings of The Twentieth International Conference on Machine Learning, ICML (2003), pp. 856–863
Z. Zhao, H. Liu, Searching for interacting features, in Proceedings of 20th International Joint Conference on Artificial Intelligence, IJCAI (2007), pp. 1156–1161
I. Kononenko, Estimating attributes: analysis and extensions of relief, in Proceedings of European Conference on Machine Learning, ECML. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence) (1994), pp. 171–182
K. Kira, L. Rendell, A practical approach to feature selection, in Proceedings of the 9th International Conference on Machine Learning, ICML (1992), pp. 249–256
H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
S. Ramírez-Gallego, I. Lastra, D. Martínez-Rego, V. Bol\({\rm \acute{\notin }}\)n-Canedo, J.M Benítez, F. Herrera, A. Alonso-Betanzos, Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 32, 134–152 (2017)
S. Seth, J.C. Principe, Variable selection: a statistical dependence perspective, in Proceedings of the International Conference of Machine Learning and Applications (2010), pp. 931–936
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
M. Mejía-Lavalle, E. Sucar, G. Arroyo, Feature selection with a perceptron neural net, in Proceedings of the International Workshop on Feature Selection for Data Mining (2006), pp. 131–135
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc.: Ser. B 58(1), 267–288 (1996)
H. Zou, T. Hastie, Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B 67(2), 301–320 (2005)
D.W. Marquardt, R.D. Snee, Ridge regression in practice. Am. Stat. 29(1), 1–20 (1975)
M.F. Balin, A. Abid, J.Y. Zou, Concrete autoencoders: differentiable feature selection and reconstruction, in International Conference on Machine Learning (2019), pp. 444–453
B. Cancela, V. Bolón-Canedo, A. Alonso-Betanzos, E2E-FS: an end-to-end feature selection method for neural networks. arXiv e-prints (2020)
E. Frank, M.A. Hall, I.H. Witten. Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, 2016)
D. Dua, C. Graff, UCI machine learning repository (2017)
C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
R. Bekkerman, M. Bilenko, J. Langford, Scaling Up Machine Learning: Parallel and Distributed Approaches (Cambridge University Press, Cambridge, 2011)
J.A. Olvera-López, J.A. Carrasco-Ochoa, J.F. Martínez-Trinidad, J. Kittler, A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)
D. Rego-Fernández, V. Bolón-Canedo, A. Alonso-Betanzos, Scalability analysis of mRMR for microarray data, in Proceedings of the 6th International Conference on Agents and Artificial Intelligence (2014), pp. 380–386
A. Alonso-Betanzos, V. Bolón-Canedo, D. Fernández-Francos, I. Porto-Díaz, N. Sánchez-Maroño, Up-to-Date feature selection methods for scalable and efficient machine learning, in Efficiency and Scalability Methods for Computational Intellect (IGI Global, 2013), pp. 1–26
M. Bramer, Principles of Data Mining, vol. 180 (Springer, Berlin, 2007)
L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
P.K. Chan, S.J. Stolfo, Toward parallel and distributed learning by meta-learning, in AAAI Workshop in Knowledge Discovery in Databases (1993), pp. 227–240
V.S. Ananthanarayana, D.K. Subramanian, M.N. Murty, Scalable, distributed and dynamic mining of association rules. High Perform. Comput. HiPC 2000, 559–566 (2000)
G. Tsoumakas, I. Vlahavas, Distributed data mining of large classifier ensembles, in Proceedings Companion Volume of the Second Hellenic Conference on Artificial Intelligence (2002), pp. 249–256
S. McConnell, D.B. Skillicorn, Building predictors from vertically distributed data, in Proceedings of the 2004 Conference of the Centre for Advanced Studies on Collaborative research (IBM Press, 2004), pp. 150–162
D.B. Skillicorn, S.M. McConnell, Distributed prediction from vertically partitioned data. J. Parallel Distrib. Comput. 68(1), 16–36 (2008)
M. Banerjee, S. Chakravarty, Privacy preserving feature selection for distributed data using virtual dimension, in Proceedings of the 20th ACM International Conference on Information and Knowledge Management (ACM, 2011), pp. 2281–2284
Z. Zhao, R. Zhang, J. Cox, D. Duling, W. Sarle, Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013)
A. Sharma, S. Imoto, S. Miyano, A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(3), 754–764 (2011)
V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, Distributed feature selection: an application to microarray data classification. Appl. Soft Comput. 30, 136–150 (2015)
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Apache Hadoop. http://hadoop.apache.org/. Accessed January 2021
Apache Spark. https://spark.apache.org. Accessed January 2021
MLib / Apache Spark. https://spark.apache.org/mllib. Accessed January 2021
L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms (Wiley, New York, 2013)
S. Nogueira, G. Brown, Measuring the stability of feature selection with applications to ensemble methods, in Proceedings of the International Workshop on Multiple Classifier Systems (2015), pp. 135–146
L.I. Kuncheva, A stability index for feature selection, in Proceedings of the 25th IASTED International Multiconference Artificial intelligence and applications (2007), pp. 421–427
B. Seijo-Pardo, Porto-Díaz, V. Bolón-Canedo, A. Alonso-Betanzos. Ensemble feature selection, homogeneous and heterogeneous approaches. Knowl.-Based Syst. 114, 124–139 (2017)
V. Bolón-Canedo, K. Sechidis, N. Sánchez-Maroño, A. Alonso-Betanzos, G. Brown, Exploring the consequences of distributed feature selection in DNA microarray data, in Proceedings 2017 International Joint Conference on Neural Networks (IJCNN) (2017), pp. CFP17–US–DVD
V. Bolón-Canedo, A. Alonso-Betanzos, Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
B. Seijo-Pardo, V. Bolón-Canedo, A. Alonso-Betanzos, On developing an automatic threshold applied to feature selection ensembles. Inf. Fusion 45, 227–245 (2019)
V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34, 483–519 (2013)
V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, An ensemble of filters and classifiers for microarray data classification. Pattern Recognit. 45(1), 531–539 (2012)
J. Rogers, S. Gunn, Ensemble algorithms for feature selection. Deterministic and Statistical Methods in Machine Learning. Lecture Notes in Computer Science, vol. 3635 (2005), pp. 180–198
P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, Cambridge, 2012)
S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From theory to algorithms (Cambridge University Press, Cambridge, 2014)
K. Bunte, M. Biehl, B. Hammer, A general framework for dimensionality-reducing data visualization mapping. J. Neural Comput. 24, 771–804 (2012)
P. Castells A. Bellogín, I. Cantador, A. Ortigosa (2010) Discerning relevant model features in a content-based collaborative recommender system, in Preference Learning, ed. by J. Färnkranz, E. Hällermeier (Springer, Berlin, 2010), pp. 429–455
N. Sánchez-Maroño, A. Alonso-Betanzos, O. Fontenla-Romero, C. Brinquis-Núñez, J.G. Polhill, T. Craig, A. Dumitru, R. García-Mira, An agent-based model for simulating environmental behavior in an educational organization. Neural Process. Lett. 42(1), 89–118 (2015)
D.M. Maniyar, I.T. Nabney, Data visualization with simultaneous feature selection, in 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB’06 (IEEE, 2006), pp. 1–8
J. Krause, A. Perer, E. Bertini, Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans. Vis. Comput. Graph. 20(12), 1614–1623 (2014)
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: visualising image classification models and saliency maps (2013), arXiv:1312.6034
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, in International Conference on Learning Representations (ICLR) (2015)
B. Cancela, V. Bolón-Canedo, A. Alonso-Betanzos, J. Gama, A scalable saliency-based feature selection method with instance-level information. Knowl.-Based Syst. 192, 105326 (2020)
J. Chen, L. Song, M. Wainwright, M. Jordan, Learning to explain: an information-theoretic perspective on model interpretation, in International Conference on Machine Learning (2018), pp. 883–892
J. Yoon, J. Jordon, M. van der Schaar, Invase: instance-wise variable selection using neural networks, in International Conference on Learning Representations (2018)
S. Ray, J. Park, S. Bhunia, Wearables, implants, and internet of things: the technology needs in the evolving landscape. IEEE Trans. Multi-Scale Comput. Syst. 2(2), 123–128 (2016)
P. Koopman, Design constraints on embedded real time control systems (1990)
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 2704–2713
N. Wang, J. Choi, D. Brand, C. Chen, K. Gopalakrishnan, Training deep neural networks with 8-bit floating point numbers, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (2018), pp. 7686–7695
X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: an extremely efficient convolutional neural network for mobile devices, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 6848–6856
S. Tschiatschek, F. Pernkopf, Parameter learning of Bayesian network classifiers under computational constraints, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2015), pp. 86–101
L. Morán-Fernández, K. Sechidis, V. Bolón-Canedo, A. Alonso-Betanzos, G. Brown, Feature selection with limited bit depth mutual information for portable embedded systems. Knowl.-Based Syst. 197, 105885 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Bolón-Canedo, V., Alonso-Betanzos, A., Morán-Fernández, L., Cancela, B. (2022). Feature Selection: From the Past to the Future. In: Virvou, M., Tsihrintzis, G.A., Jain, L.C. (eds) Advances in Selected Artificial Intelligence Areas. Learning and Analytics in Intelligent Systems, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-030-93052-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-93052-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93051-6
Online ISBN: 978-3-030-93052-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)