Skip to main content

Advertisement

Log in

Critical review of reporting of the data analysis step in metabolomics

  • Review Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Introduction

We present the first study to critically appraise the quality of reporting of the data analysis step in metabolomics studies since the publication of minimum reporting guidelines in 2007.

Objectives

The aim of this study was to assess the standard of reporting of the data analysis step in metabolomics biomarker discovery studies and to investigate whether the level of detail supplied allows basic understanding of the steps employed and/or reuse of the protocol. For the purposes of this review we define the data analysis step to include the data pretreatment step and the actual data analysis step, which covers algorithm selection, univariate analysis and multivariate analysis.

Method

We reviewed the literature to identify metabolomic studies of biomarker discovery that were published between January 2008 and December 2014. Studies were examined for completeness in reporting the various steps of the data pretreatment phase and data analysis phase and also for clarity of the workflow of these sections.

Results

We analysed 27 papers, published anytime in 2008 until the end of 2014 in the area or biomarker discovery in serum metabolomics. The results of this review showed that the data analysis step in metabolomics biomarker discovery studies is plagued by unclear and incomplete reporting. Major omissions and lack of logical flow render the data analysis’ workflows in these studies impossible to follow and therefore replicate or even imitate.

Conclusions

While we await the holy grail of computational reproducibility in data analysis to become standard, we propose that, at a minimum, the data analysis section of metabolomics studies should be readable and interpretable without omissions such that a data analysis workflow diagram could be extrapolated from the study and therefore the data analysis protocol could be reused by the reader. That inconsistent and patchy reporting obfuscates reproducibility is a given. However even basic understanding and reuses of protocols are hampered by the low level of detail supplied in the data analysis sections of the studies that we reviewed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amathieu, R., et al. (2011). Metabolomic approach by 1H NMR spectroscopy of serum for the assessment of chronic liver failure in patients with cirrhosis. Journal of Proteome Research, 10(7), 3239–3245.

    Article  CAS  PubMed  Google Scholar 

  • Asiago, V. M., et al. (2010). Early detection of recurrent breast cancer using metabolite profiling. Cancer Research, 70(21), 8309–8318.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bertini, I., et al. (2012). Metabolomic NMR fingerprinting to identify and predict survival of patients with metastatic colorectal cancer. Cancer Research, 72(1), 356–364.

    Article  CAS  PubMed  Google Scholar 

  • Boulesteix, A.-L. H., Hornung, R., & Sauerbrei, W. (2017). On fishing for significance and statistician’sdegree of freedom in the era of big molecular data. In M. Ott, W. Pietsch & J. Wernecke (Eds.), Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data. Wiesbaden: Springer.

    Google Scholar 

  • Braaksma, M., et al. (2009). The effect of environmental conditions on extracellular protease activity in controlled fermentations of Aspergillus niger. Microbiology, 155(Pt 10), 3430–3439.

    Article  CAS  PubMed  Google Scholar 

  • Brazma, A., et al. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nature Genetics, 29(4), 365–371.

    Article  CAS  PubMed  Google Scholar 

  • Brereton, R. G., & Lloyd, G. R. (2014). Partial least squares discriminant analysis: Taking the magic away. Journal of Chemometrics, 28(4), 213–225.

    Article  CAS  Google Scholar 

  • Chiarugi, A., et al. (2012). The NAD metabolome [mdash] a key determinant of cancer cell biology. Nature Reviews Cancer, 12(11), 741–752.

    Article  CAS  PubMed  Google Scholar 

  • Dunn, W. B., et al. (2017). Quality assurance and quality control processes: Summary of a metabolomics community questionnaire. Metabolomics, 13(5), 50.

    Article  Google Scholar 

  • Dupuy, A., & Simon, R. M. (2007). Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. Journal of the National Cancer Institute, 99(2), 147–157.

    Article  PubMed  Google Scholar 

  • Dutta, M., et al. (2012). A metabonomics approach as a means for identification of potential biomarkers for early diagnosis of endometriosis. Molecular BioSystems, 8(12), 3281–3287.

    Article  CAS  PubMed  Google Scholar 

  • Farshidfar, F., et al. (2012). Serum metabolomic profile as a means to distinguish stage of colorectal cancer. Genome Medicine, 4(5), 42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fiehn, O., et al. (2007). Minimum reporting standards for plant biology context information in metabolomic studies. Metabolomics, 3(3), 195–201.

    Article  CAS  Google Scholar 

  • Fiehn, O., et al. (2007). The metabolomics standards initiative (MSI). Metabolomics, 3(3), 175–178.

    Article  CAS  Google Scholar 

  • Freedman, L. P., Cockburn, I. M., & Simcoe, T. S. (2015). The economics of reproducibility in preclinical research. PLoS Biology, 13(6), e1002165.

    Article  PubMed  PubMed Central  Google Scholar 

  • Glasziou, P., et al. (2014). Reducing waste from incomplete or unusable reports of biomedical research. The Lancet, 383(9913), 267–276.

    Article  Google Scholar 

  • Godzien, J., et al. (2013). From numbers to a biological sense: How the strategy chosen for metabolomics data treatment may affect final results. A practical example based on urine fingerprints obtained by LC-MS. Electrophoresis 34, 2812–2826.

    CAS  PubMed  Google Scholar 

  • Golbraikh, A., & Tropsha, A. (2002). Beware of q2! Journal of Molecular Graphics and Modelling, 20(4), 269–276.

    Article  CAS  PubMed  Google Scholar 

  • Goodacre, R., et al. (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231–241.

    Article  CAS  Google Scholar 

  • Greenberg, N., et al. (2009). A proposed metabolic strategy for monitoring disease progression in Alzheimer’s disease. Electrophoresis, 30(7), 1235–1239.

    Article  CAS  PubMed  Google Scholar 

  • Griffin, J. L., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Mammalian/in vivo experiments. Metabolomics, 3(3), 179–188.

    Article  CAS  Google Scholar 

  • Gromski, P. S., et al. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis—A marriage of convenience or a shotgun wedding. Analytica Chimica Acta, 879, 10–23.

    Article  CAS  PubMed  Google Scholar 

  • Guan, W., et al. (2009). Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines. BMC Bioinformatics, 10, 259.

    Article  PubMed  PubMed Central  Google Scholar 

  • Hori, S., et al. (2011). A metabolomic approach to lung cancer. Lung Cancer, 74(2), 284–292.

    Article  PubMed  Google Scholar 

  • Hrydziuszko, O., & Viant, M. R. (2012). Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics, 8(1), 161–174.

    Article  CAS  Google Scholar 

  • Jiang, Z., et al. (2011). A metabonomic approach applied to predict patients with cerebral infarction. Talanta, 84(2), 298–304.

    Article  CAS  PubMed  Google Scholar 

  • Jin, H., et al. (2014). Serum metabolomic signatures of lymph node metastasis of esophageal squamous cell carcinoma. Journal of Proteome Research, 13(9), 4091–4103.

    Article  CAS  PubMed  Google Scholar 

  • Jobard, E., et al. (2014). A serum nuclear magnetic resonance-based metabolomic signature of advanced metastatic human breast cancer. Cancer Letters, 343(1), 33–41.

    Article  CAS  PubMed  Google Scholar 

  • Johansen, K. K., et al. (2009). Metabolomic profiling in LRRK2-related Parkinson’s disease. PLoS ONE, 4(10), e7551.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lin, L., et al. (2010). Direct infusion mass spectrometry or liquid chromatography mass spectrometry for human metabonomics? A serum metabonomic study of kidney cancer. Analyst, 135(11), 2970–2978.

    Article  CAS  PubMed  Google Scholar 

  • Liu, Y., et al. (2014). NMR and LC/MS-based global metabolomics to identify serum biomarkers differentiating hepatocellular carcinoma from liver cirrhosis. International Journal of Cancer, 135(3), 658–668.

    Article  CAS  PubMed  Google Scholar 

  • Lu, Y., et al. (2012). Serum metabolomics for the diagnosis and classification of myasthenia gravis. Metabolomics, 8(4), 704–713.

    Article  CAS  Google Scholar 

  • Mallett, S., et al. (2010). Reporting methods in studies developing prognostic models in cancer: A review. BMC Medicine, 8(1), 20.

    Article  PubMed  PubMed Central  Google Scholar 

  • McShane, L. M., et al. (2005). REporting recommendations for tumour MARKer prognostic studies (REMARK). British Journal of Cancer, 93(4), 387–391.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Metz, C.E. (2011). Metz ROC software at the University of Chicago.

  • Michell, A. W., et al. (2008). Metabolomic analysis of urine and serum in Parkinson’s disease. Metabolomics, 4(3), 191.

    Article  CAS  Google Scholar 

  • Mickiewicz, B., et al. (2013). Metabolomics as a novel approach for early diagnosis of pediatric septic shock and its mortality. American Journal of Respiratory and Critical Care Medicine, 187(9), 967–976.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Morrison, N., et al. (2007). Standard reporting requirements for biological samples in metabolomics experiments: Environmental context. Metabolomics, 3(3), 203–210.

    Article  CAS  Google Scholar 

  • Mousavi, M., et al. (2014). Serum metabolomic biomarkers of dementia. Dementia and Geriatric Cognitive Disorders Extra, 4(2), 252–262.

    Article  PubMed  PubMed Central  Google Scholar 

  • Osborn, M. P., et al. (2013). Metabolome-wide association study of neovascular age-related macular degeneration. PLoS ONE, 8(8), e72737.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ouyang, X., et al. (2011). 1H NMR-based metabolomic study of metabolic profiling for systemic lupus erythematosus. Lupus, 20(13), 1411–1420.

    Article  CAS  PubMed  Google Scholar 

  • Peng, R. (2015). The reproducibility crisis in science: A statistical counterattack. Significance, 12(3), 30–32.

    Article  Google Scholar 

  • Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226–1227.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Peng, R. D., Dominici, F., & Zeger, S. L. (2006). Reproducible epidemiologic research. American Journal of Epidemiology, 163(9), 783–789.

    Article  PubMed  Google Scholar 

  • R Development Core Team. (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna.

    Google Scholar 

  • Roberts, L. D., Koulman, A., & Griffin, J. L. (2014). Towards metabolic biomarkers of insulin resistance and type 2 diabetes: Progress from the metabolome. The Lancet Diabetes & Endocrinology, 2(1), 65–75.

    Article  CAS  Google Scholar 

  • Roede, J. R., et al. (2013). Serum metabolomics of slow vs. rapid motor progression Parkinson’s disease: A pilot study. PLoS ONE, 8(10), e77629.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Rubtsov, D. V., et al. (2007). Proposed reporting requirements for the description of NMR-based metabolomics experiments. Metabolomics, 3(3), 223–229.

    Article  CAS  Google Scholar 

  • Salek, R. M., et al. (2015). COordination of Standards in MetabOlomicS (COSMOS): Facilitating integrated metabolomics data access. Metabolomics, 11(6), 1587–1597.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sangster, T., et al. (2006). A pragmatic and readily implemented quality control strategy for HPLC-MS and GC-MS-based metabonomic analysis. Analyst, 131(10), 1075–1078.

    Article  CAS  PubMed  Google Scholar 

  • Shah, J. S., Brock, G. N., & Rai, S. N. (2015). Metabolomics data analysis and missing value issues with application to infarcted mouse hearts. BMC Bioinformatics, 16(Suppl 15), P16–P16.

    Article  PubMed Central  Google Scholar 

  • Spicer, R., Salek, R., & Steinbeck, C. (2017). Compliance with minimum information guidelines in public metabolomics repositories. Scientific Data, 4, 17137.

    Google Scholar 

  • Steinbeck, C., et al. (2012). MetaboLights: Towards a new COSMOS of metabolomics data management. Metabolomics, 8(5), 757–760.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sumner, L. W., et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics, 3(3), 211–221.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Taylor, C. F., et al. (2007). The minimum information about a proteomics experiment (MIAPE). Nature Biotechnology, 25(8), 887–893.

    Article  CAS  PubMed  Google Scholar 

  • van den Berg, R. A., et al. (2006). Centering, scaling, and transformations: Improving the biological information content of metabolomics data. BMC Genomics, 7, 142–142.

    Article  PubMed  PubMed Central  Google Scholar 

  • Vinaixa, M., et al. (2012). A guideline to univariate statistical analysis for LC/MS-based untargeted metabolomics-derived data. Metabolites, 2(4), 775–795.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Walsh, B. H., et al. (2012). The metabolomic profile of umbilical cord blood in neonatal hypoxic ischaemic encephalopathy. PLoS ONE, 7(12), e50520.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wang, J., et al. (2013). Metabolomic identification of diagnostic plasma biomarkers in humans with chronic heart failure. Molecular BioSystems, 9(11), 2618–2626.

    Article  CAS  PubMed  Google Scholar 

  • Wei, C., et al. (2012). A metabonomics study of epilepsy in patients using gas chromatography coupled with mass spectrometry. Molecular Biosystems, 8(8), 2197–2204.

    Article  CAS  PubMed  Google Scholar 

  • Weiner, J., et al. (2012). Biomarkers of inflammation, immunosuppression and stress with active disease are revealed by metabolomic profiling of tuberculosis patients. PLoS ONE, 7(7), e40221.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Wilkinson, M. D., et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3, 160018.

    Article  PubMed  PubMed Central  Google Scholar 

  • Williams, H. R. T., et al. (2012). Serum metabolic profiling in inflammatory bowel disease. Digestive Diseases and Sciences, 57(8), 2157–2165.

    Article  CAS  PubMed  Google Scholar 

  • Wishart, D. S. (2016). Emerging applications of metabolomics in drug discovery and precision medicine. Nature Reviews Drug Discovery, 15(7), 473–484.

    Article  CAS  PubMed  Google Scholar 

  • Xia, J., et al. (2009). MetaboAnalyst: A web server for metabolomic data analysis and interpretation. Nucleic Acids Research, 37(Web Server issue), W652–W660.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xia, J., et al. (2013). Translational biomarker discovery in clinical metabolomics: An introductory tutorial. Metabolomics, 9(2), 280–299.

    Article  CAS  PubMed  Google Scholar 

  • Young, S. P., et al. (2013). The impact of inflammation on metabolomic profiles in patients with arthritis. Arthritis and Rheumatism, 65(8), 2015–2023.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zang, X., et al. (2014). Feasibility of detecting prostate cancer by ultraperformance liquid chromatography-mass spectrometry serum metabolomics. Journal of Proteome Research, 13(7), 3444–3454.

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. C. Considine.

Ethics declarations

Conflict of interest

We declare no competing financial interests.

Appendix

Appendix

See Figs. 1, 2, 3 and 4.

Fig. 1
figure 1

Flow of included studies on disease prediction from serum metabolomics

Fig. 2
figure 2

Overview of reproducibility, readability and the clarity of the workflow pipeline of the overall data analysis in the studies reviewed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used. *All code and data are available upon request but the definition of complete reproducibility is the availability of linked and executable code, so this study is not fully reproducible

Fig. 3
figure 3

Reporting of pretreatment steps employed in the studies reviewed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used

Fig. 4
figure 4

Completeness of reporting of supervised analysis steps and counts of the algorithms, performance metrics and validation methods employed. Green: detail is reported; red: detail is not reported; blue: counts of method/program/algorithm/performance metric used

See Tables 1, 2, 3, 4, 5, 6, 7 and 8.

Table 1 General descriptive characteristics of studies reviewed
Table 2 Expected outcome of studies under review
Table 3 Overview of reproducibility or readability of data analysis steps of studies in this review
Table 4 Packages reported used or code used for analysis
Table 5 Pre-treatment phase details reported
Table 6 Univariate analysis details reported
Table 7 Unsupervised analysis details reported
Table 8 Supervised analysis details reported

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Considine, E.C., Thomas, G., Boulesteix, A.L. et al. Critical review of reporting of the data analysis step in metabolomics. Metabolomics 14, 7 (2018). https://doi.org/10.1007/s11306-017-1299-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-017-1299-3

Keywords

Navigation