A Statistical Comparison of Feature Selection Techniques for Solar Energy Forecasting Based on Geographical Data

  • Saloua El Motaki University Sidi Mohamed Ben Abdellah
  • Abdelhak El Fengour University Ibn Tofail/University of Castilla-La Mancha

Abstract

In recent years, solar energy forecasting has been increasingly embraced as a sustainable low-energy solution to environmental awareness. It is a subject of interest to the scientific community, and machine learning techniques have proven to be a powerful means to construct an automatic learning model for an accurate prediction. Along with the various machine learning and data mining utilities applied to solar energy prediction, the process of feature selection is becoming an ultimate requirement for improving model building efficiency. In this paper, we consider the feature selection (FS) approach potential. We provide a detailed taxonomy of various feature selection techniques and examine their usability and ability to deal with a solar energy forecasting problem, given meteorological and geographical data. We focus on filter-based, wrapper-based, and embedded-based feature selection methods. We use the reduced number of selected features, stability, and regression accuracy and compare feature selection techniques. Moreover, the experimental results demonstrate how the feature selection methods studied can considerably improve the prediction process and how the selected features vary by method, depending on the given data constraints.

Keywords

feature selection, filter method, wrapped method, embedded method, solar energy forecasting, regression performance, smart environment,

References

1. M. Diagne, M. David, Ph. Lauret, J. Boland, N. Schmut, Review of solar irradiance forecasting methods and a proposition for small-scale insular grids, Renewable and Sustainable Energy Reviews, 27: 65–76, 2013, doi: 10.1016/j.rser.2013.06.042.
2. H. Liu, L. Yu, Toward integrating feature selection algorithms or classification and clustering, IEEE Trans. on Knowledge and Data Engineering, 17(4): 491–502, 2005, doi: 10.1109/TKDE.2005.66.
3. M.A. Hall, Correlation-based feature selection for discrete and numeric class machine learning, [in:] Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, pp. 359–366, Morgan Kaufmann Publishers Inc., 2000.
4. M. Dash et al., Feature selection for clustering – a filter solution, [in:] Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM ’02, pp. 115–122, Washington, DC, USA, IEEE Computer Society, 2002.
5. Y. Saeys, I. Inza, P. Larrañaga, A review of feature selection techniques in bioinformatics, Bioinformatics, 23(19): 2507–2517, 2007 doi: 10.1093/bioinformatics/btm344.
6. R. Kohavi, G.H. John, Wrappers for feature subset selection, Artificial Intelligence, 97(1–2): 273–324, 1997, doi: https://doi.org/10.1016/S0004-3702(97)00043-X.
7. L. Rangarajan, Veerabhadrappa. Bi-level dimensionality reduction methods using feature selection and feature extraction, International Journal of Computer Applications, 4(2): 33–38. 2010.
8. I. Guyon, A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research, 3: 1157–1182, 2003.
9. R. Mundry, C.L. Nunn, Stepwise model fitting and statistical inference: turning noise into signal pollution, The American Naturalist, 173(1): 119–123, 2009, doi: 10.1086/593303.
10. J Reunanen, Overfitting in making comparisons between variable selection methods, Journal of Machine Learning Research, 3:1371–1382, 2003.
11. J. Cai, J. Luo, S.Wang, S. Yang, Feature selection in machine learning: A new perspective, Neurocomputing, 300: 70–79, 2018, doi: 10.1016/j.neucom.2017.11.077.
12. J. Brownlee, Data Preparation for Machine Learning: Data Cleaning, Feature Selection, and Data Transforms in Python, Machine Learning Mastery, 2020.
13. J. Li et al., Feature selection: A data perspective, ACM Computing Surveys, 50(6): 1–45, 2017, doi: 10.1145/3136625.
14. G. Georgiev, I. Valova, N. Gueorguieva, Feature selection for multiclass problems based on information weights, Procedia Computer Science, 6: 189–194, 2011, doi: 10.1016/j.procs.2011.08.036.
15. L. Wang, Y. Wang, Q. Chang, Feature selection methods for big data bioinformatics: A survey from the search perspective, Methods, 111: 21–31, 2016, doi: 10.1016/j.ymeth.2016.08.014.
16. P. Drotár, J. Gazda, Z. Smékal, An experimental comparison of feature selection methods on two-class biomedical datasets, Computers in Biology and Medicine, 66: 1–10, 2015, doi: 10.1016/j.compbiomed.2015.08.010.
17. S. Khalid, T. Khalil, S. Nasreen, A survey of feature selection and feature extraction techniques in machine learning, [in:] 2014 Science and Information Conference, pp. 372–378, Aug. 2014, doi: 10.1109/SAI.2014.6918213.
18. W. Awada, T.M. Khoshgoftaar, D. Dittman, R. Wald, A. Napolitano, A review of the stability of feature selection techniques for bioinformatics data, [in:] 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI), pp. 356–363, 2012, doi: 10.1109/IRI.2012.6303031.
19. R. Martin, R. Aler, J.M. Valls, I.M. Galvan, Machine learning techniques for daily solar energy prediction and interpolation using numerical weather models, Concurrency and Computation: Practice and Experience, 28(4): 1261–1274, 2016, doi: 10.1002/cpe.3631.
20. R. Aler, R. Martín, J.M. Valls, I.M. Galván, A study of machine learning techniques for daily solar energy forecasting using numerical weather models, [in:] D. Camacho, L. Braubach, S. Venticinque, C. Badica [Eds], Intelligent Distributed Computing VIII, Studies in Computational Intelligence, Vol. 570, pp. 269–278, Springer International Publishing, 2015, doi: 10.1007/978-3-319-10422-5_29.
21. D. O’Leary, J. Kubby, Feature selection and ANN solar power prediction, Journal of Renewable Energy, 2017: 1–7, 2017, doi: 10.1155/2017/2437387.
22. O. Abedinia, N. Amjady, N. Ghadimi, Solar energy forecasting based on hybrid neural network and improved metaheuristic algorithm, Computational Intelligence, 34(1): 241–260, 2018, doi: 10.1111/coin.12145.
23. L. Zhang, J. Wen, A systematic feature selection procedure for short-term data-driven building energy forecasting model development, Energy and Buildings, 183: 428–442, 2019, doi: 10.1016/j.enbuild.2018.11.010.
24. O. Garcia-Hinde et al., Feature selection in solar radiation prediction using bootstrapped SVRs, [in:] 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 3638–3645, 2016, doi: 10.1109/CEC.2016.7744250.
25. M.R. Hossain, A.M.T. Oo, A.B.M.S. Ali, The effectiveness of feature selection method in solar power prediction, Journal of Renewable Energy, 2013, Article ID: 952613, 9 pages, 2013, doi: 10.1155/2013/952613.
26. C. Lazar et al., A survey on filter techniques for feature selection in gene expression microarray analysis, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(4): 1106–1119, 2012, doi: 10.1109/TCBB.2012.33.
27. A. Kraskov, H. Stögbauer, P. Grassberger, Estimating mutual information, Physical Review E, 69: 066138, 2004, doi: 10.1103/PhysRevE.69.066138.
28. R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological), 58(1): 267–288, 1996, doi: 10.1111/j.2517-6161.1996.tb02080.x.
29. L. Breiman, Random Forests, Machine Learning, 45(1): 5–32, 2001, doi: 10.1023/A:1010933404324.
30. Open Power System Data – A platform for open data of the European power system, https://data.open-power-system-data.org/conventional_power_plants/2018-12-20 (accessed: 2019-09-14).
31. A.-C. Haury, P. Gestraud, J.-P. Vert, The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures, PloS ONE, 6(12): e28210, 2011, doi: 10.1371/journal.pone.0028210.
32. P. Mohana Chelvan, K. Perumal, A survey on feature selection stability measures, International Journal of Computer and Information Technology, 5(1): 98–103, 2016.
33. U.M. Khaire, R. Dhanalakshmi, Stability of feature selection algorithm: A review, Journal of King Saud University – Computer and Information Sciences, 2019, doi: 10.1016/j.jksuci.2019.06.012.
Published
Jun 21, 2021
How to Cite
EL MOTAKI, Saloua; EL FENGOUR, Abdelhak. A Statistical Comparison of Feature Selection Techniques for Solar Energy Forecasting Based on Geographical Data. Computer Assisted Methods in Engineering and Science, [S.l.], v. 28, n. 2, p. 105–118, june 2021. ISSN 2956-5839. Available at: <https://cames.ippt.pan.pl/index.php/cames/article/view/324>. Date accessed: 18 apr. 2024. doi: http://dx.doi.org/10.24423/cames.324.
Section
Articles