Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 294))

Abstract

Traditional search engines are not efficient enough to extract useful information from scientific text databases. Therefore, it is necessary to develop advanced information retrieval software tools that allow for further classification of the scientific texts. The aim of this work is to present BioClass, a freely available graphic tool for biomedical text classification. With BioClass an user can parameterize, train and test different text classifiers to determine which technique performs better according to the document corpus. The framework includes data balancing and attribute reduction techniques to prepare the input data and improve the classification efficiency. Classification methods analyze documents by content and differentiate those that are best suited to the user requeriments. BioClass also offers graphical interfaces to get conclusions simply and easily.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman (1999)

    Google Scholar 

  2. Tan, S.: Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications 28(4), 667–671 (2005)

    Article  Google Scholar 

  3. Anand, A., Pugalenthi, G., Fogel, G.B., Suganthan, P.N.: An approach for classification of highly imbalanced data using weighting and undersampling. Amino Acids 39, 1385–1391 (2010)

    Article  Google Scholar 

  4. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explorations 6(1), 1–6 (2004)

    Article  Google Scholar 

  5. Garner, S.R.: Weka: The waikato environment for knowledge analysis. In: Proc. of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)

    Google Scholar 

  6. Zhang, J., Mani, I.: knn approach to unbalanced data distributions: A case study involving information extraction. In: Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets (2003)

    Google Scholar 

  7. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand (April 1999)

    Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  9. Liu, H., Setiono, R.: A probabilistic approach to feature selection - a filter solution. In: 13th International Conference on Machine Learning, pp. 319–327 (1996)

    Google Scholar 

  10. Lorenzo, J.: Selección de Atributos en Aprendizaje Automático basado en la Teoría de la Información. PhD thesis, Faculty of Computer Science, Univ. of Las Palmas (2002)

    Google Scholar 

  11. Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11(1), 63–90 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  12. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer (2002)

    Google Scholar 

  13. Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of biocreative: critical assessment of information extraction for biology. BMC Bioinformatics 6(suppl.1), s1 (2005)

    Google Scholar 

  14. Ando, R.K., Dredze, M., Zhang, T.: Trec 2005 genomics track experiments at ibm watson. In: Proceedings of TREC 2005. NIST Special Publication (2005)

    Google Scholar 

  15. Collier, N., Hahn, U., Rebholz-Schuhmann, D., Rinaldi, F., Pyysalo, S. (eds.): Proceedings of the Fourth International Symposium for Semantic Mining in Biomedicine, Cambridge, United Kingdom. CEUR Workshop Proceedings, vol. 714. CEUR-WS.org (October 2010)

    Google Scholar 

  16. Osuna, E., Freund, R., Girosi, F.: Support vector machines: Training and applications. Technical report, Cambridge, MA, USA (1997)

    Google Scholar 

  17. Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29, 103–130 (1997)

    Article  MATH  Google Scholar 

  18. Dasarathy, B.V.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  19. Glez-Peña, D., Reboiro-Jato, M., Maia, P., Rocha, M., Díaz, F., Fdez-Riverola, F.: Aibench: A rapid application development framework for translational research in biomedicine. Comput. Methods Prog. Biomed. 98, 191–203 (2010)

    Article  Google Scholar 

  20. Chang, C., Lin, C.: LIBSVM: a library for support vector machines (2001)

    Google Scholar 

  21. Demšar, J., Zupan, B., Leban, G., Curk, T.: Orange: From experimental machine learning to interactive data mining. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 537–539. Springer, Heidelberg (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to R. Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Romero, R., Vieira, A.S., Iglesias, E.L., Borrajo, L. (2014). BioClass: A Tool for Biomedical Text Classification. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07581-5_29

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07580-8

  • Online ISBN: 978-3-319-07581-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics