Skip to main content

Application of Discriminant Analysis and Cross-Validation on Proteomics Data

  • Protocol
Statistical Analysis in Proteomics

Abstract

High-throughput proteomic experiments have raised the importance and complexity of bioinformatic analysis to extract useful information from raw data. Discriminant analysis is frequently used to identify differences among test groups of individuals or to describe combinations of discriminant variables. However, even in relatively large studies, the number of detected variables typically largely exceeds the number of samples and the classifiers should be thoroughly validated to assess their performance for new samples. Cross-validation is a widely approach when an external validation set is not available. In this chapter, different approaches for cross-validation are presented including relevant aspects that should be taken into account to avoid overly optimistic results and the assessment of the statistical significance of cross-validated figures of merit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Esbensen KH, Geladi P (2010) Principles of proper validation: use and abuse of re-sampling for validation. J Chemometr 24:168–187

    Article  CAS  Google Scholar 

  2. Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 58:109–130

    Article  CAS  Google Scholar 

  3. Westerhuis JA, Velzen EJJ, van Hoefsloot HCJ et al (2008) Discriminant Q2 (DQ2) for improved discrimination in PLSDA models. Metabolomics 4:293–296

    Article  CAS  Google Scholar 

  4. Szymańska E, Saccenti E, Smilde AK et al (2012) Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics 8:3–16

    Article  PubMed Central  PubMed  Google Scholar 

  5. Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137–148

    Article  Google Scholar 

  6. Esbensen KH, Guyot D, Westad F et al (2004) Multivariate data analysis—in practice. An introduction to multivariate data analysis and experimental design, 5th edn. CAMO Process AS, Oslo

    Google Scholar 

  7. Rubingh CM, Bijlsma S, Derks EPPA et al (2006) Assessing the performance of statistical validation tools for megavariate metabolomics data. Metabolomics 2:53–61

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Westerhuis JA, Hoefsloot HCJ, Smit S et al (2008) Assessment of PLSDA cross validation. Metabolomics 4:81–89

    Article  CAS  Google Scholar 

  9. Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemometr 23:160–171

    Article  CAS  Google Scholar 

  10. Gidskehaug L, Anderssen E, Alsberg BK (2008) Cross model validation and optimisation of bilinear regression models. Chemometr Intell Lab Syst 93:1–10

    Article  CAS  Google Scholar 

  11. Knijnenburg TA, Wessels LFA, Reinders MJT et al (2009) Fewer permutations, more accurate p-values. Bioinformatics 25:161–168

    Article  Google Scholar 

  12. Wongravee K, Lloyd GR, Hall J et al (2009) Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles. Metabolomics 5:387–406

    Article  CAS  Google Scholar 

  13. Kuligowski J, Perez-Guaita D, Escobar J et al (2013) Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis. Talanta 116:835–840

    Article  CAS  PubMed  Google Scholar 

  14. Bakeev K (ed) (2010) Process analytical technology: spectroscopic tools and implementation strategies for the chemical and pharmaceutical industries, 2nd edn. Wiley, New York

    Google Scholar 

  15. Krstajic D, Buturovic LL, Leahy DE et al (2010) Cross validation pitfalls when selection and assessing regression and classification models. J Cheminform 6:10

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillermo Quintás Ph.D. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Kuligowski, J., Pérez-Guaita, D., Quintás, G. (2016). Application of Discriminant Analysis and Cross-Validation on Proteomics Data. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3106-4_11

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3105-7

  • Online ISBN: 978-1-4939-3106-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics