Classification Potential vs. Classification Accuracy: A Comprehensive Study of Evolutionary Algorithms with Biomedical Datasets

Tanwani, Ajay Kumar; Farooq, Muddassar

doi:10.1007/978-3-642-17508-4_9

Ajay Kumar Tanwani²⁴ &
Muddassar Farooq²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6471))

Included in the following conference series:

617 Accesses
8 Citations

Abstract

Biomedical datasets pose a unique challenge for machine learning and data mining techniques to extract accurate, comprehensible and hidden knowledge from them. In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. To this end, we quantify the complexity of a biomedical dataset in terms of its missing values, imbalance ratio, noise and information gain. We have performed our experiments using six well-known evolutionary rule learning algorithms – XCS, UCS, GAssist, cAnt-Miner, SLAVE and Ishibuchi – on 31 publicly available biomedical datasets. The results of our experiments and statistical analysis show that GAssist gives better classification results on majority of biomedical datasets among the compared schemes but cannot be categorized as the best classifier. Moreover, our analysis reveals that the nature of a biomedical dataset – not the selection of evolutionary algorithm – plays a major role in determining the classification accuracy of a dataset. We further show that noise is a dominating factor in determining the complexity of a dataset and it is inversely proportional to the classification accuracy of all evaluated algorithms. Towards the end, we provide researchers with a meta-classification model that can be used to determine the classification potential of a dataset on the basis of its complexity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Pena-Reyes, C.A., Sipper, M.: Evolutionary computation in medicine: an overview. Journal of Artificial Intelligence in Medicine 19(1), 1–23 (2000)
Article Google Scholar
Wong, M.L., Lam, W., Leung, K.S., Ngan, P.S., Cheng, J.C.V.: Discovering knowledge from medical databases using evolutionary algorithms. IEEE Engineering in Medicine and Biology 19(4), 45–55 (2000)
Article Google Scholar
Holmes, J.H.: Learning classifier systems applied to knowledge discovery in clinical research databases. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2000. LNCS (LNAI), vol. 1996, pp. 243–261. Springer, Heidelberg (2001)
Chapter Google Scholar
Bernado Mansilla, E.: Domain of competence of XCS classifier system in complexity measurement space. IEEE Transactions on Evolutionary Computation 9(1), 82–104 (2005)
Article Google Scholar
Kharbat, F., Bull, L., Odeh, M.: Mining breast cancer data with XCS, Genetic and Evolutionary Computation Conference (GECCO), pp. 2066-2073, UK (2007)
Google Scholar
Puig, A.O., Mansilla, E.B.: Evolutionary rule-based systems for imbalanced data sets. Soft Computing - A Fusion of Foundations, Methodologies and Applications 13(3), 213–225 (2009)
Google Scholar
Bacardit, J., Butz, M.V.: Data mining in learning classifier systems: comparing XCS with GAssist. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003. LNCS (LNAI), vol. 4399, pp. 282–290. Springer, Heidelberg (2007)
Chapter Google Scholar
Bernadó, E., Llorà, X., Garrell, J.M.: XCS and GALE: a comparative study of two learning classifier systems with six other learning algorithms on classification tasks. In: Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2001. LNCS (LNAI), vol. 2321, pp. 115–132. Springer, Heidelberg (2002)
Chapter Google Scholar
Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: An ant colony based system for data mining: applications to medical data. In: Int. Conf. on Knowledge Discovery and Data mining, Boston, pp. 55–62 (2000)
Google Scholar
Galea, M., Shen, Q., Levine, J.: Evolutionary approaches to fuzzy modelling for classification. Knowledge Engineering Review 19(1), 27–59 (2004)
Article Google Scholar
Tanwani, A.K., Afridi, J., Shafiq, M.Z., Farooq, M.: Guidelines to select machine learning scheme for classifcation of biomedical datasets. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds.) EvoBIO 2009. LNCS, vol. 5483, pp. 128–139. Springer, Heidelberg (2009)
Chapter Google Scholar
Butz, M.V., Kovacs, T., Lanzi, P.L., Wilson, S.W.: Toward a theory of generalization and learning in XCS. IEEE Transactions on Evolutionary Computation 8(1), 28–46 (2004)
Article Google Scholar
Bernado-Mansilla, E., Garrell-Guiu, J.M.: Accuracy-based learning classifier systems: models, analysis and applications to classification tasks. Evolutionary Computation 11(3), 209–238 (2006)
Article Google Scholar
Bacardit, J., Garrell, J.M.: Bloat control and generalization pressure using the minimum description length principle for a Pittsburgh approach Learning Classifier System. In: Kovacs, T., Llorà, X., Takadama, K., Lanzi, P.L., Stolzmann, W., Wilson, S.W. (eds.) IWLCS 2003. LNCS (LNAI), vol. 4399, pp. 59–79. Springer, Heidelberg (2007)
Chapter Google Scholar
Otero, F.E.B., Freitas, A.A., Johnson, C.J.: cAnt-Miner: an ant colony classification algorithm to cope with continuous attributes. In: Ant Colony Optimization and Swarm Intelligence, Belgium, pp. 48–59 (2008)
Google Scholar
Gonzalez, A., Perez, R.: SLAVE: a genetic learning system based on an iterative approach. IEEE Transaction on Fuzzy Systems 7(2), 176–191 (1999)
Article Google Scholar
Ishibuchi, H., Nakashima, T., Murata, T.: Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. IEEE Transactions on Systems, Man, and Cybernetics 29(5), 601–618 (1999)
Article Google Scholar
Fawcett, T.: ROC graphs: notes and practical considerations for researchers, TR HPL-2003-4, HP Labs, USA (2004)
Google Scholar
UCI repository of machine learning databases, University of California-Irvine, Department of Information and Computer Science, www.ics.uci.edu/~mlearn/MLRepository.html (last accessed: June 25, 2010)
Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study of their impacts. Artificial Intelligence Review 22(3), 177–210 (2004)
Article MATH Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. Journal of Artificial Intelligence Research 11, 131–167 (1999)
MATH Google Scholar
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Otero, F.E.B.: Ant Colony Optimization Framework, MYRA, http://sourceforge.net/projects/myra/ (last accessed: June 27, 2010)
Alcala-Fdez, J., Sanchez, L., Garcia, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernandez, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing 13, 307–318 (2008)
Article Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple datasets. Journal of Machine Learning and Research 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Garcia, S., Herrera, F.: An extension on ”Statistical comparisons of classifiers over multiple datasets” for all pairwise comparisons. Journal of Machine Learning and Research 9, 2677–2694 (2008)
MATH Google Scholar
Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics 11, 86–92 (1940)
Article MathSciNet MATH Google Scholar
Iman, R.L., Davenport, J.M.: Approximations of the critical region of the Friedman statistic. Communications in Statistics, 571–595 (1980)
Google Scholar
Dunn, O.J.: Multiple comparisons among means. Journal of the American Statistical Association 56, 52–64 (1961)
Article MathSciNet MATH Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)
MathSciNet MATH Google Scholar
Nemenyi, P.B.: Distribution-free multiple comparisons, PhD Thesis, Princeton University (1963)
Google Scholar
Parpinelli, R.S., Lopes, H.S., Freitas, A.A.: Data mining with an ant colony optimization algorithm. IEEE Transactions on Evolutionary Computation 6(4), 321–332 (2002)
Article MATH Google Scholar
Orriols-Puig, A., Bernadó-Mansilla, E.: Revisiting UCS: description, fitness sharing and comparison with XCS. In: Bacardit, J., Bernadó-Mansilla, E., Butz, M.V., Kovacs, T., Llorà, X., Takadama, K. (eds.) IWLCS 2006 and IWLCS 2007. LNCS (LNAI), vol. 4998, pp. 96–116. Springer, Heidelberg (2008)
Chapter Google Scholar
Jo, T., Japkowicz, N.: Class imbalances versus small disjuncts. SIGKDD Explor. Newsl. 6(1), 40–49 (2004)
Article Google Scholar
Tanwani, A.K., Farooq, M.: The role of biomedical dataset in classification. In: Combi, C., Shahar, Y., Abu-Hanna, A. (eds.) Artificial Intelligence in Medicine. LNCS (LNAI), vol. 5651, pp. 370–374. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Next Generation Intelligent Networks Research Center (nexGIN RC), National University of Computer & Emerging Sciences (FAST-NU), Islamabad, Pakistan
Ajay Kumar Tanwani & Muddassar Farooq

Authors

Ajay Kumar Tanwani
View author publications
You can also search for this author in PubMed Google Scholar
Muddassar Farooq
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, ASAP research group, University of Nottingham, Jubilee Campus, Nottingham, NG8 1BB, and Multidisciplinary Centre for Integrative Biology, School of Biosciences, LE12 5RD, Sutton Bonington, UK
Jaume Bacardit
School of Engineering and Computer Science, Victoria University of Wellington, PO Box 600, 6140, Wellington, New Zealand
Will Browne
Department of Brain and Cognitive Sciences, University of Rochester, Meliora Hall, 14627, Rochester, NY, USA
Jan Drugowitsch
Enginyeria i Arquitectura La Salle, Universitat Ramon Llull, Quatre Camins, 2, 08022, Barcelona, Spain
Ester Bernadó-Mansilla
Department of Psychology III, University of Würzburg, COBOSLAB – Cognitive Bodyspaces: Learning and Behavior,, Röntgenring 11, 97070, Würzburg, Germany
Martin V. Butz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tanwani, A.K., Farooq, M. (2010). Classification Potential vs. Classification Accuracy: A Comprehensive Study of Evolutionary Algorithms with Biomedical Datasets. In: Bacardit, J., Browne, W., Drugowitsch, J., Bernadó-Mansilla, E., Butz, M.V. (eds) Learning Classifier Systems. IWLCS IWLCS 2009 2008. Lecture Notes in Computer Science(), vol 6471. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17508-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-17508-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17507-7
Online ISBN: 978-3-642-17508-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics