Abstract
The current investigations were carried out in the context of a nutritional case study aiming at assessing the postnatal impact of maternal dietary protein restriction during pregnancy and lactation on rat offspring plasma metabolome and hypothalamic proteome. Although data generated by different “Omics” technologies are usually considered and analyzed separately, their interrelation may offer a valuable opportunity for assessing the emerging ‘integrated biology’ concept. The overall strategy of analysis first investigated data pretreatment and variable selection for each dataset. Then, three multivariate analyses were applied to investigate the links between the abundance of metabolites and the expression of proteins collected on the same samples. Unfold principal component analysis and regularized canonical correlation analysis did not take into account the presence of groups of individuals related to the intervention study. On the contrary, the predictive MultiBlock Partial Least Squares method used this information. Regularized canonical correlation analysis appeared as a relevant approach to investigate of the relationships between the two datasets. However, in order to highlight the molecular compounds, proteins and metabolites, associated in interacting or common metabolic pathways for the experimental groups, MultiBlock partial least squares was the most appropriate method in the present nutritional case study.
Similar content being viewed by others
References
Alexandre-Gouabau, Bailly, E., Moyon, T. L., Grit, I. C., Coupé, B., Le Drean, G., et al. (2011a). Postnatal growth velocity modulates alterations of proteins involved in metabolism and neuronal plasticity in neonatal hypothalamus in rats born with intrauterine growth restriction. The Journal of Nutritional Biochemistry, 23(2), 140–152.
Alexandre-Gouabau, Courant, F., Le Gall, G., Moyon, T., Darmaun, D., Parnet, P., et al. (2011b). Offspring metabolomic response to maternal protein restriction in a rat model of intrauterine growth restriction (IUGR). Journal of Proteome Research, 10(7), 3292–3302.
Allen, D. M. (1974). The relationship between variable selection and data agumentation and a method for prediction. Technometrics, 16(1), 125–127.
Angelica, D., Luigi, A., Antonio, N., Hille Adriaan, V. G., Diego, G., Vincenzo, Z., et al. (2011). Metabolomics in newborns with intrauterine growth retardation (IUGR): Urine reveals markers of metabolic syndrome. Journal of Maternal-Fetal and Neonatal Medicine, 24, 35–39.
Barker, M., & Rayens, W. (2003). Partial least squares for discrimination. Journal of Chemometrics, 17(3), 166–173.
Bouret, S. G., & Simerly, R. B. (2006). Developmental programming of hypothalamic feeding circuits. Clinical Genetics, 70(4), 295–301.
Brereton, R. G. (2006). Consequences of sample size, variable selection, and model validation and optimisation, for predicting classification ability from analytical data. TrAC, Trends in Analytical Chemistry, 25(11), 1103–1111.
Cawley, G. C., & Talbot, N. L. C. (2003). Efficient leave-one-out cross-validation of kernel Fisher discriminant classifiers. Pattern Recognition, 36(11), 2585–2592.
Coupe, B., Amarger, V., Grit, I., Benani, A., & Parnet, P. (2010). Nutritional programming affects hypothalamic organization and early response to leptin. Endocrinology, 151(2), 702.
Davis, C. D., & Milner, J. (2004). Frontiers in nutrigenomics, proteomics, metabolomics and cancer prevention. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis, 551(1–2), 51–64.
Fança-Berthon, P., Michel, C., Pagniez, A., Rival, M., Van Seuningen, I., Darmaun, D., et al. (2009). Intrauterine growth restriction alters postnatal colonic barrier maturation in rats. Pediatric Research, 66(1), 47.
Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2), 215–223.
González, I., Déjean, S., Martin, P. G. P., & Baccini, A. (2008). CCA: An R package to extend canonical correlation analysis. Journal of Statistical Software, 23(12), 1–14.
González, I., Déjean, S., Martin, P., Gonçalves, O., Besse, P., & Baccini, A. (2009). Highlighting relationships between heteregeneous biological data through graphical displays based on regularized canonical correlation analysis. Journal of Biological Systems, 17(2), 173–199.
Henningsson, M., Sundbom, E., Armelius, B. Å., & Erdberg, P. (2001). PLS model building: A multivariate approach to personality test data. Scandinavian Journal of Psychology, 42(5), 399–409.
Henrion, R. (1994). N-way principal component analysis theory, algorithms and applications. Chemometrics and Intelligent Laboratory Systems, 25(1), 1–23.
Horst, P. (1961). Generalized canonical correlations and their applications to experimental data. Journal of Clinical Psychology, 17(4), 331–347.
Kemsley, E. K., Le Gall, G., Dainty, J. R., Watson, A. D., Harvey, L. J., Tapp, H. S., et al. (2007). Multivariate techniques and their application in nutrition: A metabolomics case study. British Journal of Nutrition, 98(01), 1–14.
Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika, 58(3), 433.
Kremser, K., Stangl, H., Pahan, K., & Singh, I. (1995). Nitric oxide regulates peroxisomal enzyme activities. Clinical Chemistry and Laboratory Medicine, 33(11), 763–774.
Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28(5), 1–26.
Lane, R. H., Kelley, D. E., Gruetzmacher, E. M., & Devaskar, S. U. (2001). Uteroplacental insufficiency alters hepatic fatty acid-metabolizing enzymes in juvenile and adult rats. American Journal of Physiology-Regulatory, Integrative and Comparative Physiology, 280(1), R183.
Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25(1), 1–18.
Malaguarnera, M., Vacante, M., Avitabile, T., Cammalleri, L., & Motta, M. (2009). L-Carnitine supplementation reduces oxidized LDL cholesterol in patients with diabetes. The American Journal of Clinical Nutrition, 89(1), 71.
Mayr, M., Madhu, B., & Xu, Q. (2007). Proteomics and metabolomics combined in cardiovascular research. Trends in Cardiovascular Medicine, 17(2), 43–48.
Mevik, B. H., & Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software, 18(2), 1–24.
Morgane, P. J., Mokler, D. J., & Galler, J. R. (2002). Effects of prenatal protein malnutrition on the hippocampal formation. Neuroscience and Biobehavioral Reviews, 26(4), 471–483.
Nedenskov Jensen, K., Jessen, F., & Jørgensen, B. M. (2008). Multivariate data analysis of two-dimensional gel electrophoresis protein patterns from few samples. Journal of Proteome Research, 7(3), 1288–1296.
Parsons, H. M., Ludwig, C., Günther, U. L., & Viant, M. R. (2007). Improved classification accuracy in 1- and 2-dimensional NMR metabolomics data using the variance stabilising generalised logarithm transformation. BMC Bioinformatics, 8(1), 234–250.
Pereira, H., Martin, J. F., Joly, C., Sébédio, J. L., & Pujos-Guillot, E. (2009). Development and validation of a UPLC/MS method for a nutritional metabolomic study of human plasma. Metabolomics, 6(2), 207–218.
Pérez-Enciso, M., & Tenenhaus, M. (2003). Prediction of clinical outcome with microarray data: A partial least squares discriminant analysis (PLS-DA) approach. Human Genetics, 112(5), 581–592.
Rubingh, C. M., Bijlsma, S., Derks, E. P. P. A., Bobeldijk, I., Verheij, E. R., Kochhar, S., et al. (2006). Assessing the performance of statistical validation tools for megavariate metabolomics data. Metabolomics, 2(2), 53–61.
Saghatelian, A., & Cravatt, B. F. (2005). Global strategies to integrate the proteome and metabolome. Current Opinion in Chemical Biology, 9(1), 62–68. doi:10.1016/j.cbpa.2004.12.004.
Shlens, J. (2009). A tutorial on principal component analysis. San Diego: Systems Neurobiology Laboratory, University of California at San Diego.
Smilde, A. K., van der Werf, M. J., Bijlsma, S., van der Werff-van, B. J. C., & Jellema, R. H. (2005). Fusion of mass spectrometry-based metabolomics data. Analytical Chemistry, 77(20), 6729–6736.
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R., & Siuzdak, G. (2006). XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical Chemistry, 78(3), 779–787.
Taylor, J. M. G., Ankerst, D. P., & Andridge, R. R. (2008). Validation of biomarker-based risk prediction models. Clinical Cancer Research, 14(19), 5977.
Team, R. (2008). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3(10).
Tenenhaus, (1998). La Régression PLS—Théorie et pratique. Paris: Editions Technip.
Tenenhaus, & Tenenhaus, (2011). Regularized generalized canonical correlation analysis. Psychometrika, 76(2), 257–284.
Thompson, B. (1984). Canonical correlation analysis: Uses and interpretation. Thousand Oaks: Sage Publications, Inc.
Van Den Berg, R. A., Hoefsloot, H. C. J., Westerhuis, J. A., Smilde, A. K., & Van Der Werf, M. J. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7(1), 142–156.
Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics, 4(2), 147–166.
Wangen, L., & Kowalski, B. (1989). A multiblock partial least squares algorithm for investigating complex chemical systems. Journal of Chemometrics, 3(1), 3–20.
Westerhuis, J. A., Kourti, T., & MacGregor, J. F. (1998). Analysis of multiblock and hierarchical PCA and PLS models. Journal of Chemometrics, 12(5), 301–321.
Wold, S., Johansson, E., & Cocchi, M. (1993). PLS—partial least squares projections to latent structures. 3-D QSAR drug design: Theory, methods and application (pp. 523–550).
Wurtman, M. C. R. J. (2007). 4 Aromatic amino acids in the brain. Handbook of neurochemistry and molecular neurobiology: Amino acids and peptides in the nervous system.
Acknowledgments
This work was supported by a grant from the Agence Nationale de la Recherche (ProtNeonat ANR-05-PNRA-009 grant). We acknowledge the contribution of Bérengère Coupé, PhD, who generated most of animal samples, sponsored by a doctoral fellowship from Institut National de la Recherche Agronomique and Région Pays de la Loire and granted by La Fondation Louis Bonduelle (France). We acknowledge Emilie Bailly for her help in 2-DE gels and image analysis, and Hélène Rogniaux for her help in protein identification by Maldi-tof and LC–MS/MS. We also acknowledge the financial support of the NUPEM project (NUtrition Périnatale et Empreinte Métabolique) from the Pays de la Loire Region. At last, the authors would like to thank Dominique Darmaun (Director of the Laboratory) for his careful re-reading of the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Moyon, T., Le Marec, F., Qannari, E.M. et al. Statistical strategies for relating metabolomics and proteomics data: a real case study in nutrition research area. Metabolomics 8, 1090–1101 (2012). https://doi.org/10.1007/s11306-012-0415-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11306-012-0415-7