Abstract
Introduction
We present the first study to critically appraise the quality of reporting of the data analysis step in metabolomics studies since the publication of minimum reporting guidelines in 2007.
Objectives
The aim of this study was to assess the standard of reporting of the data analysis step in metabolomics biomarker discovery studies and to investigate whether the level of detail supplied allows basic understanding of the steps employed and/or reuse of the protocol. For the purposes of this review we define the data analysis step to include the data pretreatment step and the actual data analysis step, which covers algorithm selection, univariate analysis and multivariate analysis.
Method
We reviewed the literature to identify metabolomic studies of biomarker discovery that were published between January 2008 and December 2014. Studies were examined for completeness in reporting the various steps of the data pretreatment phase and data analysis phase and also for clarity of the workflow of these sections.
Results
We analysed 27 papers, published anytime in 2008 until the end of 2014 in the area or biomarker discovery in serum metabolomics. The results of this review showed that the data analysis step in metabolomics biomarker discovery studies is plagued by unclear and incomplete reporting. Major omissions and lack of logical flow render the data analysis’ workflows in these studies impossible to follow and therefore replicate or even imitate.
Conclusions
While we await the holy grail of computational reproducibility in data analysis to become standard, we propose that, at a minimum, the data analysis section of metabolomics studies should be readable and interpretable without omissions such that a data analysis workflow diagram could be extrapolated from the study and therefore the data analysis protocol could be reused by the reader. That inconsistent and patchy reporting obfuscates reproducibility is a given. However even basic understanding and reuses of protocols are hampered by the low level of detail supplied in the data analysis sections of the studies that we reviewed.
Similar content being viewed by others
References
Amathieu, R., et al. (2011). Metabolomic approach by 1H NMR spectroscopy of serum for the assessment of chronic liver failure in patients with cirrhosis. Journal of Proteome Research, 10(7), 3239–3245.
Asiago, V. M., et al. (2010). Early detection of recurrent breast cancer using metabolite profiling. Cancer Research, 70(21), 8309–8318.
Bertini, I., et al. (2012). Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Research, 72(1), 356–364.
Boulesteix, A.-L. H., Hornung, R., & Sauerbrei, W. (2017). On fishing for significance and statistician’sdegree of freedom in the era of big molecular data. In M. Ott, W. Pietsch & J. Wernecke (Eds.), Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data. Wiesbaden: Springer.
Braaksma, M., et al. (2009). The effect of environmental conditions on extracellular protease activity in controlled fermentations of Aspergillus niger. Microbiology, 155(Pt 10), 3430–3439.
Brazma, A., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29(4), 365–371.
Brereton, R. G., & Lloyd, G. R. (2014). Partial least squares discriminant analysis: Taking the magic away. Journal of Chemometrics, 28(4), 213–225.
Chiarugi, A., et al. (2012). The NAD metabolome [mdash] a key determinant of cancer cell biology. Nature Reviews Cancer, 12(11), 741–752.
Dunn, W. B., et al. (2017). Quality assurance and quality control processes: Summary of a metabolomics community questionnaire. Metabolomics, 13(5), 50.
Dupuy, A., & Simon, R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, 99(2), 147–157.
Dutta, M., et al. (2012). A metabonomics approach as a means for identification of potential biomarkers for early diagnosis of endometriosis. Molecular BioSystems, 8(12), 3281–3287.
Farshidfar, F., et al. (2012). Serum metabolomic profile as a means to distinguish stage of colorectal cancer. Genome Medicine, 4(5), 42.
Fiehn, O., et al. (2007). Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics, 3(3), 195–201.
Fiehn, O., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics, 3(3), 175–178.
Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165.
Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267–276.
Godzien, J., et al. (2013). From numbers to a biological sense: How the strategy chosen for metabolomics data treatment may affect final results. A practical example based on urine fingerprints obtained by LC-MS. Electrophoresis 34, 2812–2826.
Golbraikh, A., & Tropsha, A. (2002). Beware of q2! Journal of Molecular Graphics and Modelling, 20(4), 269–276.
Goodacre, R., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231–241.
Greenberg, N., et al. (2009). A proposed metabolic strategy for monitoring disease progression in Alzheimer’s disease. Electrophoresis, 30(7), 1235–1239.
Griffin, J. L., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Mammalian/in vivo experiments. Metabolomics, 3(3), 179–188.
Gromski, P. S., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis—A marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10–23.
Guan, W., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259.
Hori, S., et al. (2011). A metabolomic approach to lung cancer. Lung Cancer, 74(2), 284–292.
Hrydziuszko, O., & Viant, M. R. (2012). Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics, 8(1), 161–174.
Jiang, Z., et al. (2011). A metabonomic approach applied to predict patients with cerebral infarction. Talanta, 84(2), 298–304.
Jin, H., et al. (2014). Serum metabolomic signatures of lymph node metastasis of esophageal squamous cell carcinoma. Journal of Proteome Research, 13(9), 4091–4103.
Jobard, E., et al. (2014). A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Letters, 343(1), 33–41.
Johansen, K. K., et al. (2009). Metabolomic profiling in LRRK2-related Parkinson’s disease. PLoS ONE, 4(10), e7551.
Lin, L., et al. (2010). Direct infusion mass spectrometry or liquid chromatography mass spectrometry for human metabonomics? A serum metabonomic study of kidney cancer. Analyst, 135(11), 2970–2978.
Liu, Y., et al. (2014). NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. International Journal of Cancer, 135(3), 658–668.
Lu, Y., et al. (2012). Serum metabolomics for the diagnosis and classification of myasthenia gravis. Metabolomics, 8(4), 704–713.
Mallett, S., et al. (2010). Reporting methods in studies developing prognostic models in cancer: A review. BMC Medicine, 8(1), 20.
McShane, L. M., et al. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). British Journal of Cancer, 93(4), 387–391.
Metz, C.E. (2011). Metz ROC software at the University of Chicago.
Michell, A. W., et al. (2008). Metabolomic analysis of urine and serum in Parkinson’s disease. Metabolomics, 4(3), 191.
Mickiewicz, B., et al. (2013). Metabolomics as a novel approach for early diagnosis of pediatric septic shock and its mortality. American Journal of Respiratory and Critical Care Medicine, 187(9), 967–976.
Morrison, N., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Environmental context. Metabolomics, 3(3), 203–210.
Mousavi, M., et al. (2014). Serum metabolomic biomarkers of dementia. Dementia and Geriatric Cognitive Disorders Extra, 4(2), 252–262.
Osborn, M. P., et al. (2013). Metabolome-wide association study of neovascular age-related macular degeneration. PLoS ONE, 8(8), e72737.
Ouyang, X., et al. (2011). 1H NMR-based metabolomic study of metabolic profiling for systemic lupus erythematosus. Lupus, 20(13), 1411–1420.
Peng, R. (2015). The reproducibility crisis in science: A statistical counterattack. Significance, 12(3), 30–32.
Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227.
Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163(9), 783–789.
R Development Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.
Roberts, L. D., Koulman, A., & Griffin, J. L. (2014). Towards metabolic biomarkers of insulin resistance and type 2 diabetes: Progress from the metabolome. The Lancet Diabetes & Endocrinology, 2(1), 65–75.
Roede, J. R., et al. (2013). Serum metabolomics of slow vs. rapid motor progression Parkinson’s disease: A pilot study. PLoS ONE, 8(10), e77629.
Rubtsov, D. V., et al. (2007). Proposed reporting requirements for the description of NMR-based metabolomics experiments. Metabolomics, 3(3), 223–229.
Salek, R. M., et al. (2015). COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics, 11(6), 1587–1597.
Sangster, T., et al. (2006). A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. Analyst, 131(10), 1075–1078.
Shah, J. S., Brock, G. N., & Rai, S. N. (2015). Metabolomics data analysis and missing value issues with application to infarcted mouse hearts. BMC Bioinformatics, 16(Suppl 15), P16–P16.
Spicer, R., Salek, R., & Steinbeck, C. (2017). Compliance with minimum information guidelines in public metabolomics repositories. Scientific Data, 4, 17137.
Steinbeck, C., et al. (2012). MetaboLights: Towards a new COSMOS of metabolomics data management. Metabolomics, 8(5), 757–760.
Sumner, L. W., et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics, 3(3), 211–221.
Taylor, C. F., et al. (2007). The minimum information about a proteomics experiment (MIAPE). Nature Biotechnology, 25(8), 887–893.
van den Berg, R. A., et al. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142–142.
Vinaixa, M., et al. (2012). A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites, 2(4), 775–795.
Walsh, B. H., et al. (2012). The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy. PLoS ONE, 7(12), e50520.
Wang, J., et al. (2013). Metabolomic identification of diagnostic plasma biomarkers in humans with chronic heart failure. Molecular BioSystems, 9(11), 2618–2626.
Wei, C., et al. (2012). A metabonomics study of epilepsy in patients using gas chromatography coupled with mass spectrometry. Molecular Biosystems, 8(8), 2197–2204.
Weiner, J., et al. (2012). Biomarkers of inflammation, immunosuppression and stress with active disease are revealed by metabolomic profiling of tuberculosis patients. PLoS ONE, 7(7), e40221.
Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.
Williams, H. R. T., et al. (2012). Serum metabolic profiling in inflammatory bowel disease. Digestive Diseases and Sciences, 57(8), 2157–2165.
Wishart, D. S. (2016). Emerging applications of metabolomics in drug discovery and precision medicine. Nature Reviews Drug Discovery, 15(7), 473–484.
Xia, J., et al. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37(Web Server issue), W652–W660.
Xia, J., et al. (2013). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9(2), 280–299.
Young, S. P., et al. (2013). The impact of inflammation on metabolomic profiles in patients with arthritis. Arthritis and Rheumatism, 65(8), 2015–2023.
Zang, X., et al. (2014). Feasibility of detecting prostate cancer by ultraperformance liquid chromatography-mass spectrometry serum metabolomics. Journal of Proteome Research, 13(7), 3444–3454.
Acknowledgements
This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Considine, E.C., Thomas, G., Boulesteix, A.L. et al. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018). https://doi.org/10.1007/s11306-017-1299-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11306-017-1299-3