Feature Selection: From the Past to the Future

Bolón-Canedo, Verónica; Alonso-Betanzos, Amparo; Morán-Fernández, Laura; Cancela, Brais

doi:10.1007/978-3-030-93052-3_2

Verónica Bolón-Canedo⁷,
Amparo Alonso-Betanzos⁷,
Laura Morán-Fernández⁷ &
…
Brais Cancela⁷

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 24))

628 Accesses
5 Citations

Abstract

Feature selection has been widely used for decades as a preprocessing step that allows for reducing the dimensionality of a problem while improving classification accuracy. The need for this kind of technique has increased dramatically in recent years with the advent of Big Data. This data explosion not only has the problem of a large number of samples, but also of big dimensionality. This chapter will analyze the paramount need for feature selection and briefly review the most popular feature selection methods and some typical applications. Moreover, as the new Big Data scenario offers new opportunities to machine learning researchers, we will discuss the new challenges that need to be faced: from the scalability of the methods to the role of feature selection in the presence of deep learning, as well as exploring its use in embedded devices. Beyond a shadow of doubt, the explosion in the number of features and computing technologies will point to a number of hot spots for feature selection researchers to launch new lines of research.

Part of the content of this chapter was previously published in Knowledge-Based Systems (https://doi.org/10.1016/j.knosys.2015.05.014, https://doi.org/10.1016/j.knosys.2020.105885, https://doi.org/10.1016/j.knosys.2019.105326), Knowledge and Information Systems (https://doi.org/10.1007/s10115-012-0487-8), and Information Fusion (https://doi.org/10.1016/j.inffus.2018.11.008).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Review on Deep Learning in Feature Selection

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Feature dimensionality reduction: a review

Article Open access 21 January 2022

References

H. Climente-González, C. Azencott, S. Kaski, M. Yamada, Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data. Bioinformatics 35(14), i427–i435 (2019)
Article Google Scholar
N. Grgic-Hlaca, M.B. Zafar, K.P. Gummadi, A. Weller, Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. AAAI 18, 51–60 (2018)
Article Google Scholar
I. Furxhi, F. Murphy, M. Mullins, A. Arvanitis, C.A. Poland, Nanotoxicology data for in silico tools: a literature review. Nanotoxicology 1–26 (2020)
Google Scholar
Y. Zhai, Y. Ong, I.W. Tsang, The emerging “big dimensionality’’. IEEE Comput. Intell. Mag. 9(3), 14–26 (2014)
Article Google Scholar
M. Tan, I.W. Tsang, L. Wang, Towards ultrahigh dimensional feature selection for big data. J. Mach. Learn. Res. 15, 1371–1429 (2014)
MathSciNet MATH Google Scholar
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, J. Attenberg, Feature hashing for large scale multitask learning, in Proceedings of the 26th Annual International Conference on Machine Learning (2009), pp. 1113–1120
Google Scholar
D.L. Donoho et al., High-dimensional data analysis: the curses and blessings of dimensionality, in AMS Math Challenges Lecture (2000), pp. 1–32
Google Scholar
R. Bellman, Dynamic Programming (Princeton UP, Princeton, NJ, 1957), p. 18
Google Scholar
I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
I. Guyon, Feature Extraction: Foundations and Applications, vol. 207 (Springer, Berlin, 2006)
Google Scholar
B. Bonev, Feature Selection Based on Information Theory (Universidad de Alicante, 2010)
Google Scholar
G. Hughes, On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968)
Article Google Scholar
A.J. Miller, Selection of subsets of regression variables. J. R. Stat. Society. Ser. (Gen.) 389–425 (1984)
Google Scholar
A.L. Blum, P. Langley, Selection of relevant features and examples in machine learning. Artif. Intell. 97(1), 245–271 (1997)
Article MathSciNet Google Scholar
M. Dash, H. Liu, Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Article Google Scholar
R. Kohavi, G.H. John, Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Article Google Scholar
H. Liu, H. Motoda, Computational Methods of Feature Selection (CRC Press, 2007)
Google Scholar
Z.A. Zhao, H. Liu, Spectral Feature Selection for Data Mining (Chapman & Hall/CRC, 2011)
Google Scholar
C. Boutsidis, P. Drineas, M.W. Mahoney, Unsupervised feature selection for the k-means clustering problem, in Advances in Neural Information Processing Systems (2009), pp. 153–161
Google Scholar
V. Roth, T. Lange, Feature selection in clustering problems, in Advances in Neural Information Processing Systems (2003)
Google Scholar
R. Leardi, A. Lupiáñez González, Genetic algorithms applied to feature selection in pls regression: how and when to use them. Chemom. Intell. Lab. Syst. 41(2), 195–207 (1998)
Google Scholar
D. Paul, E. Bair, T. Hastie, R. Tibshirani, “Preconditioning” for feature selection and regression in high-dimensional problems. Ann. Stat. 1595–1618 (2008)
Google Scholar
M. Pal, G.M. Foody, Feature selection for classification of hyperspectral data by SVM. IEEE Trans. Geosci. Remote Sens. 48(5), 2297–2307 (2010)
Article Google Scholar
L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 1205–1224 (2004)
Google Scholar
M.A. Hall, Correlation-Based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand (1999)
Google Scholar
M. Dash, H. Liu, Consistency-based search in feature selection. J. Artif. Intell. 151(1–2), 155–176 (2003)
Article MathSciNet Google Scholar
A.M. Hall, L.A. Smith, Practical feature subset selection for machine learning. J. Comput. Sci. 98, 4–6 (1998)
Google Scholar
L. Yu, H. Liu, Feature selection for high-dimensional data: a fast correlation-based filter solution, in Proceedings of The Twentieth International Conference on Machine Learning, ICML (2003), pp. 856–863
Google Scholar
Z. Zhao, H. Liu, Searching for interacting features, in Proceedings of 20th International Joint Conference on Artificial Intelligence, IJCAI (2007), pp. 1156–1161
Google Scholar
I. Kononenko, Estimating attributes: analysis and extensions of relief, in Proceedings of European Conference on Machine Learning, ECML. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence) (1994), pp. 171–182
Google Scholar
K. Kira, L. Rendell, A practical approach to feature selection, in Proceedings of the 9th International Conference on Machine Learning, ICML (1992), pp. 249–256
Google Scholar
H. Peng, F. Long, C. Ding, Feature selection based on mutual information criteria of maxdependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article Google Scholar
S. Ramírez-Gallego, I. Lastra, D. Martínez-Rego, V. Bol\({\rm \acute{\notin }}\)n-Canedo, J.M Benítez, F. Herrera, A. Alonso-Betanzos, Fast-mRMR: fast minimum redundancy maximum relevance algorithm for high-dimensional big data. Int. J. Intell. Syst. 32, 134–152 (2017)
Google Scholar
S. Seth, J.C. Principe, Variable selection: a statistical dependence perspective, in Proceedings of the International Conference of Machine Learning and Applications (2010), pp. 931–936
Google Scholar
I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002)
Article Google Scholar
M. Mejía-Lavalle, E. Sucar, G. Arroyo, Feature selection with a perceptron neural net, in Proceedings of the International Workshop on Feature Selection for Data Mining (2006), pp. 131–135
Google Scholar
R. Tibshirani, Regression shrinkage and selection via the lasso. J. R. Stat. Soc.: Ser. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
H. Zou, T. Hastie, Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B 67(2), 301–320 (2005)
Article MathSciNet Google Scholar
D.W. Marquardt, R.D. Snee, Ridge regression in practice. Am. Stat. 29(1), 1–20 (1975)
MATH Google Scholar
M.F. Balin, A. Abid, J.Y. Zou, Concrete autoencoders: differentiable feature selection and reconstruction, in International Conference on Machine Learning (2019), pp. 444–453
Google Scholar
B. Cancela, V. Bolón-Canedo, A. Alonso-Betanzos, E2E-FS: an end-to-end feature selection method for neural networks. arXiv e-prints (2020)
Google Scholar
E. Frank, M.A. Hall, I.H. Witten. Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, 2016)
Google Scholar
D. Dua, C. Graff, UCI machine learning repository (2017)
Google Scholar
C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Google Scholar
R. Bekkerman, M. Bilenko, J. Langford, Scaling Up Machine Learning: Parallel and Distributed Approaches (Cambridge University Press, Cambridge, 2011)
Google Scholar
J.A. Olvera-López, J.A. Carrasco-Ochoa, J.F. Martínez-Trinidad, J. Kittler, A review of instance selection methods. Artif. Intell. Rev. 34(2), 133–143 (2010)
Article Google Scholar
D. Rego-Fernández, V. Bolón-Canedo, A. Alonso-Betanzos, Scalability analysis of mRMR for microarray data, in Proceedings of the 6th International Conference on Agents and Artificial Intelligence (2014), pp. 380–386
Google Scholar
A. Alonso-Betanzos, V. Bolón-Canedo, D. Fernández-Francos, I. Porto-Díaz, N. Sánchez-Maroño, Up-to-Date feature selection methods for scalable and efficient machine learning, in Efficiency and Scalability Methods for Computational Intellect (IGI Global, 2013), pp. 1–26
Google Scholar
M. Bramer, Principles of Data Mining, vol. 180 (Springer, Berlin, 2007)
Google Scholar
L.I. Kuncheva, C.J. Whitaker, Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)
Article Google Scholar
P.K. Chan, S.J. Stolfo, Toward parallel and distributed learning by meta-learning, in AAAI Workshop in Knowledge Discovery in Databases (1993), pp. 227–240
Google Scholar
V.S. Ananthanarayana, D.K. Subramanian, M.N. Murty, Scalable, distributed and dynamic mining of association rules. High Perform. Comput. HiPC 2000, 559–566 (2000)
Google Scholar
G. Tsoumakas, I. Vlahavas, Distributed data mining of large classifier ensembles, in Proceedings Companion Volume of the Second Hellenic Conference on Artificial Intelligence (2002), pp. 249–256
Google Scholar
S. McConnell, D.B. Skillicorn, Building predictors from vertically distributed data, in Proceedings of the 2004 Conference of the Centre for Advanced Studies on Collaborative research (IBM Press, 2004), pp. 150–162
Google Scholar
D.B. Skillicorn, S.M. McConnell, Distributed prediction from vertically partitioned data. J. Parallel Distrib. Comput. 68(1), 16–36 (2008)
Article Google Scholar
M. Banerjee, S. Chakravarty, Privacy preserving feature selection for distributed data using virtual dimension, in Proceedings of the 20th ACM International Conference on Information and Knowledge Management (ACM, 2011), pp. 2281–2284
Google Scholar
Z. Zhao, R. Zhang, J. Cox, D. Duling, W. Sarle, Massively parallel feature selection: an approach based on variance preservation. Mach. Learn. 92(1), 195–220 (2013)
Article MathSciNet Google Scholar
A. Sharma, S. Imoto, S. Miyano, A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(3), 754–764 (2011)
Google Scholar
V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, Distributed feature selection: an application to microarray data classification. Appl. Soft Comput. 30, 136–150 (2015)
Article Google Scholar
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Apache Hadoop. http://hadoop.apache.org/. Accessed January 2021
Apache Spark. https://spark.apache.org. Accessed January 2021
MLib / Apache Spark. https://spark.apache.org/mllib. Accessed January 2021
L.I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms (Wiley, New York, 2013)
Google Scholar
S. Nogueira, G. Brown, Measuring the stability of feature selection with applications to ensemble methods, in Proceedings of the International Workshop on Multiple Classifier Systems (2015), pp. 135–146
Google Scholar
L.I. Kuncheva, A stability index for feature selection, in Proceedings of the 25th IASTED International Multiconference Artificial intelligence and applications (2007), pp. 421–427
Google Scholar
B. Seijo-Pardo, Porto-Díaz, V. Bolón-Canedo, A. Alonso-Betanzos. Ensemble feature selection, homogeneous and heterogeneous approaches. Knowl.-Based Syst. 114, 124–139 (2017)
Google Scholar
V. Bolón-Canedo, K. Sechidis, N. Sánchez-Maroño, A. Alonso-Betanzos, G. Brown, Exploring the consequences of distributed feature selection in DNA microarray data, in Proceedings 2017 International Joint Conference on Neural Networks (IJCNN) (2017), pp. CFP17–US–DVD
Google Scholar
V. Bolón-Canedo, A. Alonso-Betanzos, Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Article Google Scholar
B. Seijo-Pardo, V. Bolón-Canedo, A. Alonso-Betanzos, On developing an automatic threshold applied to feature selection ensembles. Inf. Fusion 45, 227–245 (2019)
Article Google Scholar
V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34, 483–519 (2013)
Article Google Scholar
V. Bolón-Canedo, N. Sánchez-Maroño, A. Alonso-Betanzos, An ensemble of filters and classifiers for microarray data classification. Pattern Recognit. 45(1), 531–539 (2012)
Article Google Scholar
J. Rogers, S. Gunn, Ensemble algorithms for feature selection. Deterministic and Statistical Methods in Machine Learning. Lecture Notes in Computer Science, vol. 3635 (2005), pp. 180–198
Google Scholar
P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, Cambridge, 2012)
Google Scholar
S. Shalev-Shwartz, S. Ben-David, Understanding Machine Learning: From theory to algorithms (Cambridge University Press, Cambridge, 2014)
Google Scholar
K. Bunte, M. Biehl, B. Hammer, A general framework for dimensionality-reducing data visualization mapping. J. Neural Comput. 24, 771–804 (2012)
Article Google Scholar
P. Castells A. Bellogín, I. Cantador, A. Ortigosa (2010) Discerning relevant model features in a content-based collaborative recommender system, in Preference Learning, ed. by J. Färnkranz, E. Hällermeier (Springer, Berlin, 2010), pp. 429–455
Google Scholar
N. Sánchez-Maroño, A. Alonso-Betanzos, O. Fontenla-Romero, C. Brinquis-Núñez, J.G. Polhill, T. Craig, A. Dumitru, R. García-Mira, An agent-based model for simulating environmental behavior in an educational organization. Neural Process. Lett. 42(1), 89–118 (2015)
Article Google Scholar
D.M. Maniyar, I.T. Nabney, Data visualization with simultaneous feature selection, in 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB’06 (IEEE, 2006), pp. 1–8
Google Scholar
J. Krause, A. Perer, E. Bertini, Infuse: interactive feature selection for predictive modeling of high dimensional data. IEEE Trans. Vis. Comput. Graph. 20(12), 1614–1623 (2014)
Article Google Scholar
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: visualising image classification models and saliency maps (2013), arXiv:1312.6034
D. Bahdanau, K. Cho, Y. Bengio, Neural machine translation by jointly learning to align and translate, in International Conference on Learning Representations (ICLR) (2015)
Google Scholar
B. Cancela, V. Bolón-Canedo, A. Alonso-Betanzos, J. Gama, A scalable saliency-based feature selection method with instance-level information. Knowl.-Based Syst. 192, 105326 (2020)
Google Scholar
J. Chen, L. Song, M. Wainwright, M. Jordan, Learning to explain: an information-theoretic perspective on model interpretation, in International Conference on Machine Learning (2018), pp. 883–892
Google Scholar
J. Yoon, J. Jordon, M. van der Schaar, Invase: instance-wise variable selection using neural networks, in International Conference on Learning Representations (2018)
Google Scholar
S. Ray, J. Park, S. Bhunia, Wearables, implants, and internet of things: the technology needs in the evolving landscape. IEEE Trans. Multi-Scale Comput. Syst. 2(2), 123–128 (2016)
Article Google Scholar
P. Koopman, Design constraints on embedded real time control systems (1990)
Google Scholar
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko, Quantization and training of neural networks for efficient integer-arithmetic-only inference, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 2704–2713
Google Scholar
N. Wang, J. Choi, D. Brand, C. Chen, K. Gopalakrishnan, Training deep neural networks with 8-bit floating point numbers, in Proceedings of the 32nd International Conference on Neural Information Processing Systems (2018), pp. 7686–7695
Google Scholar
X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: an extremely efficient convolutional neural network for mobile devices, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 6848–6856
Google Scholar
S. Tschiatschek, F. Pernkopf, Parameter learning of Bayesian network classifiers under computational constraints, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2015), pp. 86–101
Google Scholar
L. Morán-Fernández, K. Sechidis, V. Bolón-Canedo, A. Alonso-Betanzos, G. Brown, Feature selection with limited bit depth mutual information for portable embedded systems. Knowl.-Based Syst. 197, 105885 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Technologies, Centro de Investigación CITIC, Universidade da Coruña, Campus de Elviña s/n, 15071, A Coruña, Spain
Verónica Bolón-Canedo, Amparo Alonso-Betanzos, Laura Morán-Fernández & Brais Cancela

Authors

Verónica Bolón-Canedo
View author publications
You can also search for this author in PubMed Google Scholar
Amparo Alonso-Betanzos
View author publications
You can also search for this author in PubMed Google Scholar
Laura Morán-Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Brais Cancela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Verónica Bolón-Canedo .

Editor information

Editors and Affiliations

Department of Informatics, University of Piraeus, Piraeus, Greece
Maria Virvou
Department of Informatics, University of Piraeus, Piraeus, Greece
George A. Tsihrintzis
KES International, Shoreham-By-Sea, UK
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bolón-Canedo, V., Alonso-Betanzos, A., Morán-Fernández, L., Cancela, B. (2022). Feature Selection: From the Past to the Future. In: Virvou, M., Tsihrintzis, G.A., Jain, L.C. (eds) Advances in Selected Artificial Intelligence Areas. Learning and Analytics in Intelligent Systems, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-030-93052-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-93052-3_2
Published: 27 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93051-6
Online ISBN: 978-3-030-93052-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Feature Selection: From the Past to the Future

Abstract

Access this chapter

Similar content being viewed by others

Review on Deep Learning in Feature Selection

Feature selection techniques for machine learning: a survey of more than two decades of research

Feature dimensionality reduction: a review

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Feature Selection: From the Past to the Future

Abstract

Access this chapter

Similar content being viewed by others

Review on Deep Learning in Feature Selection

Feature selection techniques for machine learning: a survey of more than two decades of research

Feature dimensionality reduction: a review

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation