Abstract
High-throughput proteomic experiments have raised the importance and complexity of bioinformatic analysis to extract useful information from raw data. Discriminant analysis is frequently used to identify differences among test groups of individuals or to describe combinations of discriminant variables. However, even in relatively large studies, the number of detected variables typically largely exceeds the number of samples and the classifiers should be thoroughly validated to assess their performance for new samples. Cross-validation is a widely approach when an external validation set is not available. In this chapter, different approaches for cross-validation are presented including relevant aspects that should be taken into account to avoid overly optimistic results and the assessment of the statistical significance of cross-validated figures of merit.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Esbensen KH, Geladi P (2010) Principles of proper validation: use and abuse of re-sampling for validation. J Chemometr 24:168–187
Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemometr Intell Lab Syst 58:109–130
Westerhuis JA, Velzen EJJ, van Hoefsloot HCJ et al (2008) Discriminant Q2 (DQ2) for improved discrimination in PLSDA models. Metabolomics 4:293–296
Szymańska E, Saccenti E, Smilde AK et al (2012) Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics 8:3–16
Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11:137–148
Esbensen KH, Guyot D, Westad F et al (2004) Multivariate data analysis—in practice. An introduction to multivariate data analysis and experimental design, 5th edn. CAMO Process AS, Oslo
Rubingh CM, Bijlsma S, Derks EPPA et al (2006) Assessing the performance of statistical validation tools for megavariate metabolomics data. Metabolomics 2:53–61
Westerhuis JA, Hoefsloot HCJ, Smit S et al (2008) Assessment of PLSDA cross validation. Metabolomics 4:81–89
Filzmoser P, Liebmann B, Varmuza K (2009) Repeated double cross validation. J Chemometr 23:160–171
Gidskehaug L, Anderssen E, Alsberg BK (2008) Cross model validation and optimisation of bilinear regression models. Chemometr Intell Lab Syst 93:1–10
Knijnenburg TA, Wessels LFA, Reinders MJT et al (2009) Fewer permutations, more accurate p-values. Bioinformatics 25:161–168
Wongravee K, Lloyd GR, Hall J et al (2009) Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles. Metabolomics 5:387–406
Kuligowski J, Perez-Guaita D, Escobar J et al (2013) Evaluation of the effect of chance correlations on variable selection using Partial Least Squares-Discriminant Analysis. Talanta 116:835–840
Bakeev K (ed) (2010) Process analytical technology: spectroscopic tools and implementation strategies for the chemical and pharmaceutical industries, 2nd edn. Wiley, New York
Krstajic D, Buturovic LL, Leahy DE et al (2010) Cross validation pitfalls when selection and assessing regression and classification models. J Cheminform 6:10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media New York
About this protocol
Cite this protocol
Kuligowski, J., Pérez-Guaita, D., Quintás, G. (2016). Application of Discriminant Analysis and Cross-Validation on Proteomics Data. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_11
Download citation
DOI: https://doi.org/10.1007/978-1-4939-3106-4_11
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3105-7
Online ISBN: 978-1-4939-3106-4
eBook Packages: Springer Protocols