Skip to main content

Use of Classification Algorithms in Noise Detection and Elimination

  • Conference paper
Hybrid Artificial Intelligence Systems (HAIS 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5572))

Included in the following conference series:

Abstract

Data sets in Bioinformatics usually present a high level of noise. Various processes involved in biological data collection and preparation may be responsible for the introduction of this noise, such as the imprecision inherent to laboratory experiments generating these data. Using noisy data in the induction of classifiers through Machine Learning techniques may harm the classifiers prediction performance. Therefore, the predictions of these classifiers may be used for guiding noise detection and removal. This work compares three approaches for the elimination of noisy data from Bioinformatics data sets using Machine Learning classifiers: the first is based in the removal of the detected noisy examples, the second tries to reclassify these data and the third technique, named hybrid, unifies the previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhu, X., Wu, X.: Class noise vs. Attribute noise: A quantitative study of their impacts. Artificial Intelligence Review 22(3), 177–210 (2004)

    Article  MATH  Google Scholar 

  2. Van Hulse, J.D., Khoshgoftaar, T.M., Huang, H.: The pairwise attribute noise detection algorithm. Knowl. Inf. Syst. 11(2), 171–190 (2007)

    Article  Google Scholar 

  3. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  4. Noble, W.S.: Kernel Methods in Computational Biology. In: Support vector machines applications in computational biology, ch. 3, pp. 71–92. MIT Press, Cambridge (2004)

    Google Scholar 

  5. Haykin, S.: Neural Network – A Compreensive foundation, 2nd edn. Prentice-Hall, New Jersey (1999)

    MATH  Google Scholar 

  6. Breiman, L., Friedman, F., Olshen, R.A., Stone, C.J.: Classification and regression trees. Wadsworth (1984)

    Google Scholar 

  7. Verbaeten, S., Assche, A.V.: Ensemble Methods for noise elimination in Classification problems. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 317–325. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)

    Article  MATH  Google Scholar 

  9. Demsar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Mack, D.H., Tom, E.Y., Mahadev, M., Dong, H., Mittman, M., Dee, S., Levine, A.J., Gingeras, T.R., Lockhart, D.J.: Biology of Tumors. In: Mihich, K., Croce, C. (eds.), pp. 123–131. Plenum, New York (1998)

    Google Scholar 

  11. Brown, M., Grundy, W., Lin, D., Christianini, N., Sugnet, C., Haussler, D.: Support Vector Machines Classication of Microarray Gene Expression Data, Technical Report UCSC-CRL 99-09, Department of Computer Science, University California Santa Cruz, Santa Cruz, CA (1999)

    Google Scholar 

  12. Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classication of tumors using gene expression data. Technical Report 576, Department of Statistics, UC Berkeley (2000)

    Google Scholar 

  13. Yeoh, E.J., Ross, M.E., Shurtle, S.A., Williams, W.K., Patel, D., Mahfouz, R., Behm, F.G., Raimondi, S.C., Relling, M.V., Patel, A., Cheng, C., Campana, D., Wilkins, D., Zhou, X., Li, J., Liu, H., Pui, C.H., Evans, W.E., Naeve, C., Wong, L., Downing, J.R.: Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1(2), 133–143 (2002)

    Article  Google Scholar 

  14. Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: A resampling based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1-2), 91–118 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Miranda, A.L.B., Garcia, L.P.F., Carvalho, A.C.P.L.F., Lorena, A.C. (2009). Use of Classification Algorithms in Noise Detection and Elimination. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds) Hybrid Artificial Intelligence Systems. HAIS 2009. Lecture Notes in Computer Science(), vol 5572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02319-4_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02319-4_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02318-7

  • Online ISBN: 978-3-642-02319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics