Skip to main content

A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways

  • Protocol
  • First Online:
Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2553))

Abstract

The integrative method approaches are continuously evolving to provide accurate insights from the data that is received through experimentation on various biological systems. Multi-omics data can be integrated with predictive machine learning algorithms in order to provide results with high accuracy. This protocol chapter defines the steps required for the ML-multi-omics integration methods that are applied on biological datasets for its analysis and the visual interpretation of the results thus obtained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cobb M (2017) 60 years ago, Francis crick changed the logic of biology. PLoS Biol 15(9):e2003243–e2003243

    Article  PubMed  PubMed Central  Google Scholar 

  2. Reel PS, Reel S, Pearson E et al (2021) Using machine learning approaches for multi-omics data analysis: a review. Biotechnol Adv 49:107739

    Article  CAS  PubMed  Google Scholar 

  3. Surowiec I, Karimpour M, Gouveia-Figueira S et al (2016) Multi-platform metabolomics assays for human lung lavage fluids in an air pollution exposure study. Anal Bioanal Chem 408(17):4751–4764

    Article  CAS  PubMed  Google Scholar 

  4. Wei Z, Xi J, Gao S et al (2018) Metabolomics coupled with pathway analysis characterizes metabolic changes in response to BDE-3 induced reproductive toxicity in mice. Sci Rep 8(1):5423–5423

    Article  PubMed  PubMed Central  Google Scholar 

  5. Karnovsky A, Weymouth T, Hull T et al (2012) Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics (Oxford, England) 28(3):373–380

    Article  CAS  Google Scholar 

  6. Li S, Park Y, Duraisingham S et al (2013) Predicting network activity from high throughput metabolomics. PLoS Comput Biol 9(7):e1003123–e1003123

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Chakraborty S, Hosen MI, Ahmed M et al (2018) Onco-multi-OMICS approach: a new frontier in cancer research. Biomed Res Int 2018:9836256–9836256

    Article  PubMed  PubMed Central  Google Scholar 

  8. Sathya R, Abraham A (2013) Comparison of supervised and unsupervised learning algorithms for pattern classification. Int J Adv Res Artif Intell 2(2)

    Google Scholar 

  9. Argelaguet R, Velten B, Arnol D et al (2018) Multi-omics factor analysis-a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 14(6):e8124–e8124

    Article  PubMed  PubMed Central  Google Scholar 

  10. Meng C, Helm D, Frejno M et al (2015) moCluster: identifying joint patterns across multiple omics data sets. J Proteome Res 15(3):755–765

    Article  PubMed  Google Scholar 

  11. Fridley BL, Lund S, Jenkins GD et al (2012) A Bayesian integrative genomic model for pathway analysis of complex traits. Genet Epidemiol 36(4):352–359

    Article  PubMed  PubMed Central  Google Scholar 

  12. Wu D, Wang D, Zhang MQ et al (2015) Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genomics 16:1022–1022

    Article  PubMed  PubMed Central  Google Scholar 

  13. Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics (Oxford, England) 25(22):2906–2912

    Article  CAS  Google Scholar 

  14. Raftopoulou P, Petrakis EGM iCluster: A self-organizing overlay network for P2P information retrieval. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 65–76

    Google Scholar 

  15. Subramanian I, Verma S, Kumar S et al (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051–1177932219899051

    Article  PubMed  PubMed Central  Google Scholar 

  16. Lock EF, Hoadley KA, Marron JS et al (2013) Joint and individual variation explained (jive) for integrated analysis of multiple data types. Ann Appl Stat 7(1):523–542

    Article  PubMed  PubMed Central  Google Scholar 

  17. Ray P, Zheng L, Lucas J et al (2014) Bayesian joint analysis of heterogeneous genomics data. Bioinformatics 30(10):1370–1376

    Article  CAS  PubMed  Google Scholar 

  18. Zhang S, Liu C-C, Li W et al (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 40(19):9379–9391

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429

    Article  CAS  Google Scholar 

  20. Salzberg SL (1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn 16(3):235–240

    Article  Google Scholar 

  21. Domingos P, Pazzani M (1997) Mach Learn 29(2/3):103–130

    Article  Google Scholar 

  22. Vapnik VN (2000) Direct methods in statistical learning theory. The nature of statistical learning theory. Springer, New York, pp 225–265

    Book  Google Scholar 

  23. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185

    Google Scholar 

  24. Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure. Machine learning proceedings 1995. Elsevier, pp 108–114

    Google Scholar 

  25. Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77(4):802–813

    Article  CAS  PubMed  Google Scholar 

  26. Awad M, Khanna R (2015) Efficient learning machines. Apress

    Google Scholar 

  27. Van Dyke Parunak H (1998) Book review: neural networks for pattern recognition by Christopher M. Bishop (Clarendon Press, 1995). ACM SIGART Bull 9(1):41–43

    Article  Google Scholar 

  28. Tang B, Pan Z, Yin K et al (2019) Recent advances of deep learning in bioinformatics and computational biology. Front Genet 10:214–214

    Article  PubMed  PubMed Central  Google Scholar 

  29. Hristoskova A, Boeva V, Tsiporkova E (2014) A formal concept analysis approach to consensus clustering of multi-experiment expression data. BMC Bioinform 15:151–151

    Article  Google Scholar 

  30. Kirk P, Griffin JE, Savage RS et al (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics (Oxford, England) 28(24):3290–3297

    Article  CAS  Google Scholar 

  31. Lock EF, Dunson DB (2013) Bayesian consensus clustering. Bioinformatics (Oxford, England) 29(20):2610–2616

    Article  CAS  Google Scholar 

  32. Wang B, Mezlini AM, Demir F et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337

    Article  CAS  PubMed  Google Scholar 

  33. Freeman JL, Perry GH, Feuk L et al (2006) Copy number variation: new insights in genome diversity. Genome Res 16(8):949–961

    Article  CAS  PubMed  Google Scholar 

  34. Yuan Y, Savage RS, Markowetz F (2011) Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput Biol 7(10):e1002227–e1002227

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Bonnet E, Calzone L, Michoel T (2015) Integrative multi-omics module network inference with lemon-tree. PLoS Comput Biol 11(2):e1003983–e1003983

    Article  PubMed  PubMed Central  Google Scholar 

  36. Akavia UD, Litvin O, Kim J, et al (2009) Abstract B70: conexic: a Bayesian framework to detect drivers and their function uncovers an endosomal signature in melanoma. Poster presentations – proffered abstracts, American Association for Cancer Research

    Google Scholar 

  37. Draghici S, Potter RB (2003) Predicting HIV drug resistance with neural networks. Bioinformatics 19(1):98–107

    Article  CAS  PubMed  Google Scholar 

  38. Bavafaye Haghighi E, Knudsen M, Elmedal Laursen B et al (2019) Hierarchical classification of cancers of unknown primary using multi-omics data. Cancer Informat 18:1176935119872163–1176935119872163

    Article  Google Scholar 

  39. Ma A, McDermaid A, Xu J et al (2020) Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol 38(9):1007–1022

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Shen HB, Chou KC (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722

    Article  CAS  PubMed  Google Scholar 

  41. Sharifi-Noghabi H, Zolotareva O, Collins CC et al (2019) MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics (Oxford, England) 35(14):i501–i509

    Article  CAS  Google Scholar 

  42. Xu J, Wu P, Chen Y et al (2019) A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC bioinformatics 20(1):527–527

    Article  PubMed  PubMed Central  Google Scholar 

  43. Chung R-H, Kang C-Y (2019) A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification. GigaScience 8(5):giz045

    Article  PubMed  PubMed Central  Google Scholar 

  44. Rappoport N, Shamir R (2019) NEMO: cancer subtyping by integration of partial multi-omic data. Bioinformatics (Oxford, England) 35(18):3348–3356

    Article  CAS  Google Scholar 

  45. Speicher NK, Pfeifer N (2015) Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics (Oxford, England) 31(12):i268–i275

    Article  CAS  Google Scholar 

  46. Tepeli YI, Ãœnal AB, Akdemir FM et al (2019) PAMOGK: a pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups. Cold Spring Harbor, Laboratory

    Google Scholar 

  47. Kim S, Jhong J-H, Lee J et al (2017) Meta-analytic support vector machine for integrating multiple omics data. BioData mining 10:2–2

    Article  PubMed  PubMed Central  Google Scholar 

  48. Lanckriet GRG, De Bie T, Cristianini N et al (2004) A statistical framework for genomic data fusion. Bioinformatics 20(16):2626–2635

    Article  CAS  PubMed  Google Scholar 

  49. Seoane JA, Day INM, Gaunt TR et al (2014) A pathway-based data integration framework for prediction of disease progression. Bioinformatics (Oxford, England) 30(6):838–845

    Article  CAS  Google Scholar 

  50. Bowd C, Medeiros FA, Zhang Z et al (2005) Relevance vector machine and support vector machine classifier analysis of scanning laser polarimetry retinal nerve fiber layer measurements. Invest Ophthalmol Vis Sci 46(4):1322–1329

    Article  PubMed  Google Scholar 

  51. Zhou Y, Kantarcioglu M, Thuraisingham B (2012) Sparse Bayesian adversarial learning using relevance vector machine ensembles. 2012 IEEE 12th international conference on data mining. IEEE

    Google Scholar 

  52. Wu C-C, Asgharzadeh S, Triche TJ et al (2010) Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. Bioinformatics (Oxford, England) 26(6):807–813

    Article  CAS  Google Scholar 

  53. Giang T-T, Nguyen T-P, Tran D-H (2020) Stratifying patients using fast multiple kernel learning framework: case studies of Alzheimer’s disease and cancers. BMC Med Inform Decis Mak 20(1):108–108

    Article  PubMed  PubMed Central  Google Scholar 

  54. Tsuda K, Shin H, Scholkopf B (2005) Fast protein classification with multiple networks. Bioinformatics 21(Suppl 2):ii59–ii65

    Article  CAS  PubMed  Google Scholar 

  55. Culp M, Michailidis G (2008) Graph-based semisupervised learning. IEEE Trans Pattern Anal Mach Intell 30(1):174–179

    Article  PubMed  Google Scholar 

  56. Kim D, Joung J-G, Sohn K-A et al (2015) Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc: JAMIA 22(1):109–120

    Article  PubMed  Google Scholar 

  57. Bhardwaj A, Van Steen K (2020) Multi-omics data and analytics integration in ovarian cancer. IFIP Advances in Information and Communication Technology, Springer International Publishing, pp 347–357

    Google Scholar 

  58. Yue Z, Meng D, He J et al (2017) Semi-supervised learning through adaptive Laplacian graph trimming. Image Vis Comput 60:38–47

    Article  Google Scholar 

  59. Shin H, Lisewski AM, Lichtarge O (2007) Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics 23(23):3217–3224

    Article  CAS  PubMed  Google Scholar 

  60. Shin H, Hill NJ, Lisewski AM et al (2010) Graph sharpening. Expert Syst Appl 37(12):7870–7879

    Article  Google Scholar 

  61. Mostafavi S, Morris Q (2010) Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics (Oxford, England) 26(14):1759–1765

    Article  CAS  Google Scholar 

  62. Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 23(8):951–959

    Article  CAS  PubMed  Google Scholar 

  63. Wang T, Shao W, Huang Z et al (2020) MORONET: multi-omics integration via graph convolutional networks for biomedical data classification. Cold Spring Harbor, Laboratory

    Google Scholar 

  64. Chaudhary K, Poirion OB, Lu L et al (2018) Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24(6):1248–1259

    Article  CAS  PubMed  Google Scholar 

  65. Xiang Q, Dai X (2008) Improving missing value imputation in microarray data by using gene regulatory information. 2008 2nd international conference on bioinformatics and biomedical engineering. IEEE

    Google Scholar 

  66. Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc.

    Book  Google Scholar 

  67. Zhu J, Sova P, Xu Q et al (2012) Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biol 10(4):e1001301–e1001301

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Liu W, Ma S, Fenyö D (2017) Pathway-level integration of proteogenomic data in breast cancer using independent component analysis. Cold Spring Harbor, Laboratory

    Book  Google Scholar 

  69. Kaplan A, Lock EF (2017) Prediction with dimension reduction of multiple molecular data sources for patient survival. Cancer Informat 16:1176935117718517–1176935117718517

    Article  Google Scholar 

  70. Grapov D, Wanichthanarak K, Fiehn O (2015) MetaMapR: pathway independent metabolomic network analysis incorporating unknowns. Bioinformatics (Oxford, England) 31(16):2757–2760

    Article  CAS  Google Scholar 

  71. Grapov D, Fahrmann J, Wanichthanarak K et al (2018) Rise of deep learning for genomic, proteomic, and metabolomic data integration in precision medicine. Omics: J Integr Biol 22(10):630–636

    Article  CAS  Google Scholar 

  72. Nguyen ND, Wang D (2020) Multiview learning for understanding functional multiomics. PLoS Comput Biol 16(4):e1007677–e1007677

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Arjovsky M, Bottou L, Gulrajani I et al (2019) Invariant risk minimization. arXiv:1907.02893

    Google Scholar 

  74. Ma J, Yu MK, Fong S et al (2018) Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 15(4):290–298

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Tini G, Marchetti L, Priami C et al (2017) Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform 20(4):1269–1279

    Article  Google Scholar 

  76. Picard M, Scott-Boyer M-P, Bodein A et al (2021) Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 19:3735–3746

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Nicora G, Vitali F, Dagliati A et al (2020) Integrated multi-omics analyses in oncology: a review of machine learning methods and tools. Front Oncol 10:1030–1030

    Article  PubMed  PubMed Central  Google Scholar 

  78. Glass K, Huttenhower C, Quackenbush J et al (2013) Passing messages between biological networks to refine predicted interactions. PLoS One 8(5):e64832–e64832

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Wahl S, Vogt S, Stückler F et al (2015) Multi-omic signature of body weight change: results from a population-based cohort study. BMC Med 13:48–48

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vidya Niranjan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Niranjan, V., Uttarkar, A., Kaul, A., Varghese, M. (2023). A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways. In: Selvarajoo, K. (eds) Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology. Methods in Molecular Biology, vol 2553. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2617-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2617-7_19

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2616-0

  • Online ISBN: 978-1-0716-2617-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics