Skip to main content

Ensemble Feature Selection Compares to Meta-analysis for Breast Cancer Biomarker Identification from Microarray Data

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12949))

Abstract

Identifying stable and precise biomarkers is a key challenge in precision medicine. A promising approach in this direction is exploring omics data, such as transcriptome generated by microarrays, to discover candidate biomarkers. This, however, involves the fundamental issue of finding the most discriminative features in high-dimensional datasets. We proposed a homogeneous ensemble feature selection (EFS) method to extract candidate biomarkers of breast cancer from microarray datasets. Ensemble diversity is introduced by bootstraps and by the integration of seven microarray studies. As a baseline method, we used the random effect model meta-analysis, a state-of-the-art approach in the integrative analysis of microarrays for biomarkers discovery. We compared five feature selection (FS) methods as base selectors and four algorithms as base classifiers. Our results showed that the variance FS method is the most stable among the tested methods regardless of the classifier and that stability is higher within datasets than across datasets, indicating high sample heterogeneity among studies. The predictive performance of the top 20 genes selected with both approaches was evaluated with six independent microarray studies, and in four of these, we observed a superior performance of our EFS approach as compared to meta-analysis. EFS recall was as high as 85%, and the median F1-scores surpassed 80% for most of our experiments. We conclude that homogeneous EFS is a promising methodology for candidate biomarkers identification, demonstrating stability and predictive performance as satisfactory as the statistical reference method.

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001, and Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2009)

    Article  Google Scholar 

  2. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  3. Alejandro, L.R., Marlet, M.A., Gustavo Ulises, M.R., Alberto, T.: Ensemble feature selection and meta-analysis of cancer miRNA biomarkers. bioRxiv, p. 353201 (2018)

    Google Scholar 

  4. Ang, J.C., Mirzal, A., Haron, H., Hamed, H.N.A.: Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinform. 13(5), 971–989 (2016)

    Article  Google Scholar 

  5. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: An ensemble of filters and classifiers for microarray data classification. Pattern Recogn. 45(1), 531–539 (2012)

    Article  Google Scholar 

  6. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)

    Article  Google Scholar 

  7. Bolón-Canedo, V., Alonso-Betanzos, A.: Ensembles for feature selection: a review and future trends. Inf. Fusion 52, 1–12 (2019)

    Article  Google Scholar 

  8. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth and Brooks, Belmont (1984)

    MATH  Google Scholar 

  9. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  10. Cramer, H.: Mathematical Methods of Statistics (PMS-9), vol. 9. Princeton University Press, Princeton (1999)

    Google Scholar 

  11. Durinck, S., et al.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)

    Article  Google Scholar 

  12. Emberley, E.D., Murphy, L.C., Watson, P.H.: S100A7 and the progression of breast cancer. Breast Cancer Res. 6(4), 1–7 (2004)

    Article  Google Scholar 

  13. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  14. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River (1998)

    MATH  Google Scholar 

  15. He, Z., Yu, W.: Stable feature selection for biomarker discovery. Comput. Biol. Chem. 34(4), 215–225 (2010)

    Article  Google Scholar 

  16. Krieken, J.H.: Precision medicine. J. Hematopathol. 6(1), 1–1 (2013). https://doi.org/10.1007/s12308-013-0176-x

    Article  Google Scholar 

  17. Karley, D., Gupta, D., Tiwari, A.: Biomarker for cancer: a great promise for future. World J. Oncol. 2(4), 151 (2011)

    Google Scholar 

  18. Kent, J.T.: Information gain and a general measure of correlation. Biometrika 70(1), 163–173 (1983)

    Article  MathSciNet  Google Scholar 

  19. Khaire, U.M., Dhanalakshmi, R.: Stability of feature selection algorithm: a review. J. King Saud Univ. Comput. Inf. Sci. (2019)

    Google Scholar 

  20. Kuncheva, L.: A stability index for feature selection. In: Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA, vol. 2007, pp. 421–427 (2007)

    Google Scholar 

  21. Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010)

    Article  Google Scholar 

  22. Pes, B.: Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains. Neural Comput. Appl. 32(10), 5951–5973 (2019). https://doi.org/10.1007/s00521-019-04082-3

    Article  Google Scholar 

  23. Pes, B., Dessì, N., Angioni, M.: Exploiting the ensemble paradigm for stable feature selection: a case study on high-dimensional genomic data. Inf. Fusion 35, 132–147 (2017)

    Article  Google Scholar 

  24. Ritchie, M.E., et al.: LIMMA powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015)

    Article  Google Scholar 

  25. Seijo-Pardo, B., Porto-Díaz, I., Bolón-Canedo, V., Alonso-Betanzos, A.: Ensemble feature selection: homogeneous and heterogeneous approaches. Knowl.-Based Syst. 118, 124–139 (2017)

    Article  Google Scholar 

  26. Sharifi, S., Pakdel, A., Ebrahimi, M., Reecy, J.M., Fazeli Farsani, S., Ebrahimie, E.: Integration of machine learning and meta-analysis identifies the transcriptomic bio-signature of mastitis disease in cattle. PLoS ONE 13(2), 1–18 (2018)

    Article  Google Scholar 

  27. Surowiecki, J.: The Wisdom of Crowds. Knopf Doubleday Publishing Group, New York (2005)

    Google Scholar 

  28. Theil, H.: A note on certainty equivalence in dynamic planning. Econometrica 25(2), 346–349 (1957)

    Article  MathSciNet  Google Scholar 

  29. Toro-Domínguez, D., Villatoro-García, J.A., Martorell-Marugán, J., Román-Montoya, Y., Alarcón-Riquelme, M.E., Carmona-Sáez, P.: A survey of gene expression meta-analysis: methods and applications. Brief. Bioinform. 22(2), 1694–1705 (2021)

    Article  Google Scholar 

  30. Walsh, C.J., Hu, P., Batt, J., Santos, C.C.D.: Microarray meta-analysis and cross-platform normalization: integrative genomics for robust biomarker discovery. Microarrays 4(3), 389–406 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank Rodrigo Haas Bueno for his help with microarray data preprocessing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mariana Recamonde-Mendoza .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trevizan, B., Recamonde-Mendoza, M. (2021). Ensemble Feature Selection Compares to Meta-analysis for Breast Cancer Biomarker Identification from Microarray Data. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12949. Springer, Cham. https://doi.org/10.1007/978-3-030-86653-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86653-2_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86652-5

  • Online ISBN: 978-3-030-86653-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics