Ensemble Feature Selection Compares to Meta-analysis for Breast Cancer Biomarker Identification from Microarray Data

Trevizan, Bernardo; Recamonde-Mendoza, Mariana

doi:10.1007/978-3-030-86653-2_12

Ensemble Feature Selection Compares to Meta-analysis for Breast Cancer Biomarker Identification from Microarray Data

Conference paper
First Online: 09 September 2021

1225 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12949))

Abstract

Identifying stable and precise biomarkers is a key challenge in precision medicine. A promising approach in this direction is exploring omics data, such as transcriptome generated by microarrays, to discover candidate biomarkers. This, however, involves the fundamental issue of finding the most discriminative features in high-dimensional datasets. We proposed a homogeneous ensemble feature selection (EFS) method to extract candidate biomarkers of breast cancer from microarray datasets. Ensemble diversity is introduced by bootstraps and by the integration of seven microarray studies. As a baseline method, we used the random effect model meta-analysis, a state-of-the-art approach in the integrative analysis of microarrays for biomarkers discovery. We compared five feature selection (FS) methods as base selectors and four algorithms as base classifiers. Our results showed that the variance FS method is the most stable among the tested methods regardless of the classifier and that stability is higher within datasets than across datasets, indicating high sample heterogeneity among studies. The predictive performance of the top 20 genes selected with both approaches was evaluated with six independent microarray studies, and in four of these, we observed a superior performance of our EFS approach as compared to meta-analysis. EFS recall was as high as 85%, and the median F1-scores surpassed 80% for most of our experiments. We conclude that homogeneous EFS is a promising methodology for candidate biomarkers identification, demonstrating stability and predictive performance as satisfactory as the statistical reference method.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2009)
Article Google Scholar
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Alejandro, L.R., Marlet, M.A., Gustavo Ulises, M.R., Alberto, T.: Ensemble feature selection and meta-analysis of cancer miRNA biomarkers. bioRxiv, p. 353201 (2018)
Google Scholar
Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(5), 971–989 (2016)
Article Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: An ensemble of filters and classifiers for microarray data classification. Pattern Recogn. 45(1), 531–539 (2012)
Article Google Scholar
Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Article Google Scholar
Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks, Belmont (1984)
MATH Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Cramer, H.: Mathematical Methods of Statistics (PMS-9), vol. 9. Princeton University Press, Princeton (1999)
Google Scholar
Durinck, S., et al.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)
Article Google Scholar
Emberley, E.D., Murphy, L.C., Watson, P.H.: S100A7 and the progression of breast cancer. Breast Cancer Res. 6(4), 1–7 (2004)
Article Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)
MATH Google Scholar
He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34(4), 215–225 (2010)
Article Google Scholar
Krieken, J.H.: Precision medicine. J. Hematopathol. 6(1), 1–1 (2013). https://doi.org/10.1007/s12308-013-0176-x
Article Google Scholar
Karley, D., Gupta, D., Tiwari, A.: Biomarker for cancer: a great promise for future. World J. Oncol. 2(4), 151 (2011)
Google Scholar
Kent, J.T.: Information gain and a general measure of correlation. Biometrika 70(1), 163–173 (1983)
Article MathSciNet Google Scholar
Khaire, U.M., Dhanalakshmi, R.: Stability of feature selection algorithm: a review. J. King Saud Univ. Comput. Inf. Sci. (2019)
Google Scholar
Kuncheva, L.: A stability index for feature selection. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA, vol. 2007, pp. 421–427 (2007)
Google Scholar
Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010)
Article Google Scholar
Pes, B.: Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput. Appl. 32(10), 5951–5973 (2019). https://doi.org/10.1007/s00521-019-04082-3
Article Google Scholar
Pes, B., Dessì, N., Angioni, M.: Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf. Fusion 35, 132–147 (2017)
Article Google Scholar
Ritchie, M.E., et al.: LIMMA powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015)
Article Google Scholar
Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)
Article Google Scholar
Sharifi, S., Pakdel, A., Ebrahimi, M., Reecy, J.M., Fazeli Farsani, S., Ebrahimie, E.: Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS ONE 13(2), 1–18 (2018)
Article Google Scholar
Surowiecki, J.: The Wisdom of Crowds. Knopf Doubleday Publishing Group, New York (2005)
Google Scholar
Theil, H.: A note on certainty equivalence in dynamic planning. Econometrica 25(2), 346–349 (1957)
Article MathSciNet Google Scholar
Toro-Domínguez, D., Villatoro-García, J.A., Martorell-Marugán, J., Román-Montoya, Y., Alarcón-Riquelme, M.E., Carmona-Sáez, P.: A survey of gene expression meta-analysis: methods and applications. Brief. Bioinform. 22(2), 1694–1705 (2021)
Article Google Scholar
Walsh, C.J., Hu, P., Batt, J., Santos, C.C.D.: Microarray meta-analysis and cross-platform normalization: integrative genomics for robust biomarker discovery. Microarrays 4(3), 389–406 (2015)
Article Google Scholar

Download references

Acknowledgments

The authors thank Rodrigo Haas Bueno for his help with microarray data preprocessing.

Author information

Authors and Affiliations

Institute of Informatics, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
Bernardo Trevizan & Mariana Recamonde-Mendoza
Bioinformatics Core, Hospital de Clínicas de Porto Alegre, Porto Alegre, RS, Brazil
Bernardo Trevizan & Mariana Recamonde-Mendoza

Authors

Bernardo Trevizan
View author publications
You can also search for this author in PubMed Google Scholar
Mariana Recamonde-Mendoza
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariana Recamonde-Mendoza .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Cagliari, Cagliari, Italy
Chiara Garau
University of Cagliari, Cagliari, Italy
Ivan Blečić
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
University of Minho, Braga, Portugal
Ana Maria A. C. Rocha
Polytechnic University of Bari, Bari, Italy
Eufemia Tarantino
Polytechnic University of Bari, Bari, Italy
Carmelo Maria Torre

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Trevizan, B., Recamonde-Mendoza, M. (2021). Ensemble Feature Selection Compares to Meta-analysis for Breast Cancer Biomarker Identification from Microarray Data. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12949. Springer, Cham. https://doi.org/10.1007/978-3-030-86653-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-86653-2_12
Published: 09 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86652-5
Online ISBN: 978-3-030-86653-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics