Skip to main content

Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study

  • Chapter
Feature Extraction

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

Abstract

To satisfy the ever growing need for effective screening and diagnostic tests, medical practitioners have turned their attention to high resolution, high throughput methods. One approach is to use mass spectrometry based methods for disease diagnosis. Effective diagnosis is achieved by classifying the mass spectra as belonging to healthy or diseased individuals. Unfortunately, the high resolution mass spectrometry data contains a large degree of noisy, redundant and irrelevant information, making accurate classification difficult. To overcome these obstacles, feature extraction methods are used to select or create small sets of relevant features. This paper compares existing feature selection methods to a novel wrapper-based feature selection and centroid-based classification method. A key contribution is the exposition of different feature extraction techniques, which encompass dimensionality reduction and feature selection methods. The experiments, on two cancer data sets, indicate that feature selection algorithms tend to both reduce data dimensionality and increase classification accuracy, while the dimensionality reduction techniques sacrifice performance as a result of lowering the number of features. In order to evaluate the dimensionality reduction and feature selection techniques, we use a simple classifier, thereby making the approach tractable. In relation to previous research, the proposed algorithm is very competitive in terms of (i) classification accuracy, (ii) size of feature sets, (iii) usage of computational resources during both training and classification phases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • B. Adam, Y. Qu, J. W. Davis, M. D. Ward, M. A. Clements, L. H. Cazares, O. J. Semmes, P. F. Schellhammer, Y. Yasui, Z. Feng, and Jr. G. L. Wright. Serum protein fingerprinting coupled with a pattern-matching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research, 62(13):3609–3614, 2002.

    Google Scholar 

  • T. P. Conrads, M. Zhou, E. F. Petricoin III, L. Liotta, and T. D. Veenstra. Cancer diagnosis using proteomic patterns. Expert Reviews in Molecular Diagnostics, 3(4):411–420, 2003.

    Article  Google Scholar 

  • E. Diamandis. Proteomic patterns in biological fluinds: Do they represent the future of cancer diagnostics. Clinical Chemistry (Point/CounterPoint), 48(8):1272–1278, 2003.

    Article  Google Scholar 

  • T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Series in Statistics. Springer Verlag, New York, 2001.

    MATH  Google Scholar 

  • D. Johann. Clinical proteomics program databank. Technical report, National Cancer Institute, Center for Cancer Research, NCI-FDA Clinical Proteomics Program, 2003. http://ncifdaproteomics.com/ppatterns.php.

  • C. Kainz. Early detection and preoperative diagnosis of ovarian carcinoma (article in german). Wien Med Wochenschr, 146(1–2):2–7, 1996.

    Google Scholar 

  • Michael Kirby. Geometric Data Analysis: An Empirical Approach to Dimensionality Reduction and the Study of Patterns. John Wiley & Sons, New York, 2001.

    MATH  Google Scholar 

  • I. Levner. Proteomic pattern recognition. Technical report, University of Alberta, April 2004. No: TR04-10.

    Google Scholar 

  • R.H. Lilien, H. Farid, and B. R. Donald. Probabilistic disease classification of expression-dependent proteomic data from mass spectrometry of human serum. Computational Biology, 10(6), 2003.

    Google Scholar 

  • H. Park, M. Jeon, and J. B. Rosen. Lower dimensional representation of text data based on centroids and least squares. BIT, 43(2):1–22, 2003.

    Article  MathSciNet  Google Scholar 

  • S. D. Patterson and R. H. Aebersold. Proteomics: The first decade and beyond. Nature, Genetics Supplement, 33:311–323, 2003.

    Article  Google Scholar 

  • E. F. Petricoin, A. M. Ardekani, B. A. Hitt, P. J. Levine, V. A. Fusaro, S. M. Steinberg, G. B. Mills, C. Simone, D. A. Fishman, E. C. Kohn, and L. A. Liotta. Use of proteomic patterns in serum to identify ovarian cancer. The Lancet, 359(9306):572–577, 2002a.

    Article  Google Scholar 

  • E. F. Petricoin, D.K. Ornstein, C. P. Paweletz, A. Ardekani, P.S. Hackett, B. A. Hitt, A. Velassco, C. Trucco, L. Wiegand, K. Wood, C. Simone, P. J. Levine, W. M. Linehan, M. R. Emmert-Buck, S. M. Steinberg, E. C. Kohn, and L. A. Liotta. Serum preteomic patterns for detection of prostate cancer. Journal of the National Cancer Institute, 94(20):1576–1578, 2002b.

    Google Scholar 

  • W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipes in C: The Art of Scientifi Computing, Second Edition. Cambridge University Press, 2002.

    Google Scholar 

  • Y. Qu, B. Adam, Y. Yasui, M. D. Ward, L. H. Cazares, P. F. Schellhammer, Z. Feng, O. J. Semmes, and Jr. G. L. Wright. Boosted decision tree analysis of surfaceenhanced laser desorption/ionization mass spectral serum profiles discriminates prostate cancer from noncancer patients. Clinical Chemistry, 48(10):1835–1843, 2002.

    Google Scholar 

  • R. Tibshirani, T. Hastie, B. Narasimhan, and G. Chu. Class prediction by nearest shrunken centroids, with applications to dna microarrays. Statistical Science, 18(1):104–117, 2003.

    Article  MATH  MathSciNet  Google Scholar 

  • R. Tibshirani, T. Hastiey, B. Narasimhanz, S. Soltys, G. Shi, A. Koong, and Q. Le. Sample classifcation from protein mass spectrometry by ‘peak probability contrasts’. BioInformatics, 2004.

    Google Scholar 

  • B. Wu, T. Abbott, D. Fishman, W. McMurray, G. Mor, K. Stone, D. Ward, K. Williams, and H. Zhao. Comparison of statistical methods for classifcation of ovarian cancer using mass spectrometry data. BioInformatics, 19(13), 2003.

    Google Scholar 

  • J. D. Wulfkuhle, L. A. Liotta, and E. F. Petricoin. Proteomic applications for the early detection of cancer. Nature Reviews, 3:267–275, 2003.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Levner, I., Bulitko, V., Lin, G. (2006). Feature Extraction for Classification of Proteomic Mass Spectra: A Comparative Study. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-35488-8_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35487-1

  • Online ISBN: 978-3-540-35488-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics