Application of Discriminant Analysis and Cross-Validation on Proteomics Data

Kuligowski, Julia; Pérez-Guaita, David; Quintás, Guillermo

doi:10.1007/978-1-4939-3106-4_11

Julia Kuligowski³,
David Pérez-Guaita⁴ &
Guillermo Quintás Ph.D.^5,6

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1362))

Abstract

High-throughput proteomic experiments have raised the importance and complexity of bioinformatic analysis to extract useful information from raw data. Discriminant analysis is frequently used to identify differences among test groups of individuals or to describe combinations of discriminant variables. However, even in relatively large studies, the number of detected variables typically largely exceeds the number of samples and the classifiers should be thoroughly validated to assess their performance for new samples. Cross-validation is a widely approach when an external validation set is not available. In this chapter, different approaches for cross-validation are presented including relevant aspects that should be taken into account to avoid overly optimistic results and the assessment of the statistical significance of cross-validated figures of merit.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A survey of computational tools for downstream analysis of proteomic and other omic datasets

Article Open access 28 October 2015

Feature Selection and Machine Learning with Mass Spectrometry Data

Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data

References

Esbensen KH, Geladi P (2010) Principles of proper validation: use and abuse of re-sampling for validation. J Chemometr 24:168–187
Article CAS Google Scholar
Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 58:109–130
Article CAS Google Scholar
Westerhuis JA, Velzen EJJ, van Hoefsloot HCJ et al (2008) Discriminant Q2 (DQ2) for improved discrimination in PLSDA models. Metabolomics 4:293–296
Article CAS Google Scholar
Szymańska E, Saccenti E, Smilde AK et al (2012) Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics 8:3–16
Article PubMed Central PubMed Google Scholar
Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137–148
Article Google Scholar
Esbensen KH, Guyot D, Westad F et al (2004) Multivariate data analysis—in practice. An introduction to multivariate data analysis and experimental design, 5th edn. CAMO Process AS, Oslo
Google Scholar
Rubingh CM, Bijlsma S, Derks EPPA et al (2006) Assessing the performance of statistical validation tools for megavariate metabolomics data. Metabolomics 2:53–61
Article PubMed Central CAS PubMed Google Scholar
Westerhuis JA, Hoefsloot HCJ, Smit S et al (2008) Assessment of PLSDA cross validation. Metabolomics 4:81–89
Article CAS Google Scholar
Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemometr 23:160–171
Article CAS Google Scholar
Gidskehaug L, Anderssen E, Alsberg BK (2008) Cross model validation and optimisation of bilinear regression models. Chemometr Intell Lab Syst 93:1–10
Article CAS Google Scholar
Knijnenburg TA, Wessels LFA, Reinders MJT et al (2009) Fewer permutations, more accurate p-values. Bioinformatics 25:161–168
Article Google Scholar
Wongravee K, Lloyd GR, Hall J et al (2009) Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles. Metabolomics 5:387–406
Article CAS Google Scholar
Kuligowski J, Perez-Guaita D, Escobar J et al (2013) Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis. Talanta 116:835–840
Article CAS PubMed Google Scholar
Bakeev K (ed) (2010) Process analytical technology: spectroscopic tools and implementation strategies for the chemical and pharmaceutical industries, 2nd edn. Wiley, New York
Google Scholar
Krstajic D, Buturovic LL, Leahy DE et al (2010) Cross validation pitfalls when selection and assessing regression and classification models. J Cheminform 6:10
Article Google Scholar

Download references

Author information

Authors and Affiliations

Neonatal Research Centre, Health Research Institute La Fe, Valencia, Spain
Julia Kuligowski
Centre for Biospectroscopy, School of Chemistry, Monash University, Clayton, Australia
David Pérez-Guaita
Safety and sustainability Division, Leitat Technological Center, Avda. Fernando Abril Martorell, 106, 46026, Valencia, Spain
Guillermo Quintás Ph.D.
Analytical Unit, Health Research Institute La Fe, Valencia, Spain
Guillermo Quintás Ph.D.

Authors

Julia Kuligowski
View author publications
You can also search for this author in PubMed Google Scholar
David Pérez-Guaita
View author publications
You can also search for this author in PubMed Google Scholar
Guillermo Quintás Ph.D.
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillermo Quintás Ph.D. .

Editor information

Editors and Affiliations

Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany
Klaus Jung

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Kuligowski, J., Pérez-Guaita, D., Quintás, G. (2016). Application of Discriminant Analysis and Cross-Validation on Proteomics Data. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_11

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3106-4_11
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3105-7
Online ISBN: 978-1-4939-3106-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Application of Discriminant Analysis and Cross-Validation on Proteomics Data

Abstract

Access this chapter

Similar content being viewed by others

A survey of computational tools for downstream analysis of proteomic and other omic datasets

Feature Selection and Machine Learning with Mass Spectrometry Data

Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Navigation

Application of Discriminant Analysis and Cross-Validation on Proteomics Data

Abstract

Access this chapter

Similar content being viewed by others

A survey of computational tools for downstream analysis of proteomic and other omic datasets

Feature Selection and Machine Learning with Mass Spectrometry Data

Statistical Analysis and Modeling of Mass Spectrometry-Based Metabolomics Data

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation