Skip to main content
Log in

An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Hierarchical feature selection is a new research area in machine learning/data mining, which consists of performing feature selection by exploiting dependency relationships among hierarchically structured features. This paper evaluates four hierarchical feature selection methods, i.e., HIP, MR, SHSEL and GTD, used together with four types of lazy learning-based classifiers, i.e., Naïve Bayes, Tree Augmented Naïve Bayes, Bayesian Network Augmented Naïve Bayes and k-Nearest Neighbors classifiers. These four hierarchical feature selection methods are compared with each other and with a well-known “flat” feature selection method, i.e., Correlation-based Feature Selection. The adopted bioinformatics datasets consist of aging-related genes used as instances and Gene Ontology terms used as hierarchical features. The experimental results reveal that the HIP (Select Hierarchical Information Preserving Features) method performs best overall, in terms of predictive accuracy and robustness when coping with data where the instances’ classes have a substantially imbalanced distribution. This paper also reports a list of the Gene Ontology terms that were most often selected by the HIP method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aha DW (1997) Lazy learning. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Alexa A, Rahnenführer J, Lengauer T (2006) Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22(13):1600–1607

    Article  Google Scholar 

  • Barber D (2012) Bayesian reasoning and machine learning. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    Article  MATH  Google Scholar 

  • de Magalhães JP (2013) How ageing processes influence cancer. Nat Rev Cancer 13(5):357–365

    Article  Google Scholar 

  • de Magalhães JP, Budovsky A, Lehmann G, Costa J, Li Y, Fraifeld V, Church GM (2009) The human ageing genomic resources: online databases and tools for biogerontologists. Aging Cell 8(1):65–72

    Article  Google Scholar 

  • Demsǎr J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MathSciNet  MATH  Google Scholar 

  • Derrac J, Garcia S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol Comput 1(1):3–18

    Article  Google Scholar 

  • Fang Y, Wang X, Michaelis EK, Fang J (2013) Classifying aging genes into DNA repair or non-DNA repair-related categories. Lecture notes in intelligent computing theories and technology, pp 20–29

  • Fernandes M, Wan C, Tacutu R, Barardo D, Rajput A, Wang J, Thoppil H, Thornton D, Yang C, Freitas AA, de Magalhães JP (2016) Systematic analysis of the gerontome reveals links between aging and age-related diseases. Hum Mol Genet (in press). doi:10.1093/hmg/ddw307

  • Freitas AA, Vasieva O, de Magalhães JP (2011) A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related. BMC Genomics 12(27):1–11

    Google Scholar 

  • Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29(2–3):131–163

    Article  MATH  Google Scholar 

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    MATH  Google Scholar 

  • Hall MA (1998) Correlation-based feature subset selection for machine learning. PhD thesis, University of Waikato, Hamilton, New Zealand

  • Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer, Berlin

    Book  MATH  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Jain AK, Zongker D (1997) Representation and recognition of handwritten digits using deformable templates. IEEE Trans Pattern Anal Mach Intell 19(12):1386–1391

    Article  Google Scholar 

  • Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Jenatton R, Audibert JY, Bach F (2011) Structured variable selection with sparity-inducing norms. J Mach Learn Res 12:2777–2824

    MathSciNet  MATH  Google Scholar 

  • Jeong Y, Myaeng S (2013) Feature selection using a semantic hierarchy for event recognition and type classification. In: Proceedings of sixth international joint conference on natural language. Nagoya, Japan, pp 136–144

  • Jiang L, Zhang H, Cai Z, Su J (2005) Learning tree augmented naive bayes for ranking. Database Syst Adv Appl 3453:688–698

    Article  Google Scholar 

  • Kenyon CJ (2010) The genetics of ageing. Nature 464(7288):504–512

    Article  Google Scholar 

  • Keogh EJ, Pazzani MJ (1999) Learning augmented bayesian classifiers: a comparison of distribution-based and classification-based approaches. In: Proceedings of the seventh international workshop on artificial intelligence and statistics, Florida, USA, pp 225–230

  • Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Lu S, Ye Y, Tsui R, Su H, Rexit R, Wesaratchakit S, Liu X, Hwa R (2013) Domain ontology-based feature reduction for high dimensional drug data and its application to 30-day heart failure readmission prediction. In: Proceedings of the ninth international conference conference on collaborative computing: networking, applications and worksharing (Collaboratecom). Austin, USA, pp 478–484

  • Martins AFT, Smith NA, Aguiar PMQ, Figueiredo MAT (2011) Structured sparsity in structured prediction. In: Proceedings of the 2011 conference on empirical methods in natural language processing (EMNLP 2011). Edinburgh, UK, pp 1500–1511

  • Pereira RB, Plastino A, Zadrozny B, de C Merschmann LH LH, Freitas AA (2011) Lazy attribute selection: choosing attributes at classification time. Intell Data Anal 15(5):715–732

    Google Scholar 

  • Ristoski P, Paulheim H (2014) Feature selection in hierarchical feature spaces. In: Proceedings of seventeenth international conference on discovery science. Bled, Slovenia, pp 288–300

  • Sohal RS, Weindruch R (1996) Oxidative stress, caloric restriction, and aging. Science 273(5271):59–63

    Article  Google Scholar 

  • Sohal RS, Ku HH, Agarwal S, Forster MJ, Lal H (1994) Oxidative damage, mitochondrial oxidant generation and antioxidant defenses during aging and in response to food restriction in the mouse. Mech Ageing Dev 74(1–2):121–133

    Article  Google Scholar 

  • Stanfill C, Waltz D (1986) Toward memory-based reasoning. Commun ACM 29(12):1213–1228

    Article  Google Scholar 

  • Tacutu R, Craig T, Budovsky A, Wuttke D, Lehmann G, Taranukha D, Costa J, Fraifeld VE, de Magalhães JP (2013) Human ageing genomic resources: integrated databases and tools for the biology and genetics of ageing. Nucl Acids Res 41(D1):D1027–D1033

    Article  Google Scholar 

  • The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29

  • Tyner SD, Venkatachalam S, Choi J, Jones S, Ghebranious N, Igelmann H, Lu X, Soron G, Cooper B, Brayton C, Park SH, Thompson T, Karsenty G, Bradley A, Donehower LA (2002) p53 mutant mice that display early ageing-associated phenotypes. Nature 415(6867):45–53

    Article  Google Scholar 

  • Vijg J, Campisi J (2008) Puzzles, promises and a cure for ageing. Nature 454(7208):1065–1071

    Article  Google Scholar 

  • Walker G, Houthoofd K, Vanfleteren JR, Gems D (2005) Dietary restriction in \(C. elegans\): from rate-of-living effects to nutrient sensing pathways. Mech Ageing Dev 126(9):929–937

    Article  Google Scholar 

  • Wan C (2015) Novel hierarchical feature selection methods for classification and their application to datasets of ageing-related genes. PhD thesis, University of Kent, Canterbury, United Kingdom

  • Wan C, Freitas AA (2013) Prediction of the pro-longevity or anti-longevity effect of Caenorhabditis Elegans genes based on Bayesian classification methods. In: Proceedings of IEEE international conference on bioinformatics and biomedicine (BIBM 2013), Shanghai, China, pp 373–380

  • Wan C, Freitas AA (2015) Two methods for constructing a gene ontology-based feature selection network for a Bayesian network classifier and applications to datasets of aging-related genes. In: Proceedings of the sixth ACM conference on bioinformatics, computational biology and health informatics (ACM-BCB 2015). Atlanta, USA, pp 27–36

  • Wan C, Freitas AA, de Magalhães JP (2015) Predicting the pro-longevity or anti-longevity effect of model organism genes with new hierarchical feature selection methods. IEEE/ACM Trans Comput Biol Bioinf 12(2):262–275

    Article  Google Scholar 

  • Wang B, Mckay R, Abbass H, Barlow M (2003) A comparative study for domain ontology guided feature extraction. In: Proceedings of the twenty-sixth Australasian computer science conference. Adelaide, Australia, pp 69–78

  • Wood JG, Rogina B, Lavu S, Howitz K, Helfand SL, Tatar M, Sinclair D (2004) Sirtuin activators mimic caloric restriction and delay ageing in metazoans. Nature 430:686–689

    Article  Google Scholar 

  • Ye J, Liu J (2012) Sparse methods for biomedical data. ACM SIGKDD Explor Newsl 14(1):4–15

    Article  Google Scholar 

  • Zhang H, Ling CX (2001) An improved learning algorithm for augmented naive bayes. Adv Knowl Discov Data Min 2035:581–586

    MATH  Google Scholar 

  • Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6):3468–3497

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank Dr. João Pedro de Magalhães for his valuable general advice on the biology of aging for this Project. We also thank Pablo Silva for providing an implementation code of the SHSEL method. We also acknowledge the support of concurrency researchers at the University of Kent for access to the ‘CoSMoS’ cluster, funded by EPSRC Grants EP/E049419/1 and EP/E0535/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cen Wan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, C., Freitas, A.A. An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features. Artif Intell Rev 50, 201–240 (2018). https://doi.org/10.1007/s10462-017-9541-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-017-9541-y

Keywords

Navigation