Skip to main content
Log in

Machine learning methods for microbiome studies

  • Protocol
  • Published:
Journal of Microbiology Aims and scope Submit manuscript

Abstract

Researches on the microbiome have been actively conducted worldwide and the results have shown human gut bacterial environment significantly impacts on immune system, psychological conditions, cancers, obesity, and metabolic diseases. Thanks to the development of sequencing technology, microbiome studies with large number of samples are eligible on an acceptable cost nowadays. Large samples allow analysis of more sophisticated modeling using machine learning approaches to study relationships between microbiome and various traits. This article provides an overview of machine learning methods for non-data scientists interested in the association analysis of microbiomes and host phenotypes. Once genomic feature of microbiome is determined, various analysis methods can be used to explore the relationship between microbiome and host phenotypes that include penalized regression, support vector machine (SVM), random forest, and artificial neural network (ANN). Deep neural network methods are also touched. Analysis procedure from environment setup to extract analysis results are presented with Python programming language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Asgari, E., Garakani, K., McHardy, A.C., and Mofrad, M.R.K. 2018. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics34, i32–i42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ashkani, J. and Naidoo, K. 2016. Glycosyltransferase gene expression profiles classify cancer types and propose prognostic subtypes. Sci. Rep.6, 26451.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bezawada, N., Phang, T.H., Hold, G.L., and Hansen, R. 2020. Autism spectrum disorder and the gut microbiota in children: a systematic review. Ann. Nutr. Metab. 1–14. DOI: https://doi.org/10.1159/000505363

  • Bhalla, S., Chaudhary, K., Kumar, R., Sehgal, M., Kaur, H., Sharma, S., and Raghava, G.P. 2017. Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer. Sci. Rep.7, 44997.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Bolyen, E., Rideout, J.R., Dillon, M.R., Bokulich, N.A., Abnet, C.C., Al-Ghalith, G.A., Alexander, H., Alm, E.J., Arumugam, M., Asnicar, F., et al. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol.37, 852–857.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Boutin, S., Hagenfeld, D., Zimmermann, H., El Sayed, N., Höpker, T., Greiser, H.K., Becher, H., Kim, T.S., and Dalpke, A.H. 2017. Clustering of subgingival microbiota reveals microbial disease ecotypes associated with clinical stages of periodontitis in a cross-sectional study. Front. Microbiol.8, 340.

    Article  PubMed  PubMed Central  Google Scholar 

  • Bray, J.R. and Curtis, J.T. 1957. An ordination of upland forest communities of southern Wisconsin. Ecol. Monogr.27, 325–349.

    Article  Google Scholar 

  • Breiman, L. 2001. Random forests. Mach. Learn.45, 5–32.

    Article  Google Scholar 

  • Calle, M.L., Urrea, V., Boulesteix, A.L., and Malats, N. 2011. AUC-RF: a new strategy for genomic profiling with random forest. Hum. Hered.72, 121–132.

    Article  CAS  PubMed  Google Scholar 

  • Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Peña, A.G., Goodrich, J.K., Gordon, J.I., et al. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods7, 335–336.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Carlson, M.D.A. and Morrison, R.S. 2009. Study design, precision, and validity in observational studies. J. Palliat. Med.12, 77–82.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cortes, C. and Vapnik, V. 1995. Support-vector networks. Mach. Learn.20, 273–297.

    Google Scholar 

  • Du Plessis, M.C., Niu, G., and Sugiyama, M. 2014. Analysis of learning from positive and unlabeled data. Adv. Neur. Inf. Process. Syst.1, 703–711.

    Google Scholar 

  • Duvallet, C., Gibbons, S.M., Gurry, T., Irizarry, R.A., and Alm, E.J. 2017. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat. Commun.8, 1784.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Elkan, C. and Noto, K. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining (KDD), pp. 213–220.

  • Everard, A. and Cani, P.D. 2013. Diabetes, obesity and gut microbiota. Best Pract. Res. Clin. Gastroenterol.27, 73–83.

    Article  CAS  PubMed  Google Scholar 

  • Galkin, F., Aliper, A., Putin, E., Kuznetsov, I., Gladyshev, V.N., and Zhavoronkov, A. 2018. Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. BioRxiv 507780.

    Google Scholar 

  • Gérard, P. 2016. Gut microbiota and obesity. Cell. Mol. Life Sci.73, 147–162.

    Article  PubMed  CAS  Google Scholar 

  • Grice, E.A. and Segre, J.A. 2012. The human microbiome: our second genome. Annu. Rev. Genomics Hum. Genet.13, 151–170.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hamady, M. and Knight, R. 2009. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res.19, 1141–1152.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Hotelling, H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol.24, 417–441.

    Article  Google Scholar 

  • Jiang, X., Langille, M.G.I., Neches, R.Y., Elliot, M., Levin, S.A., Eisen, J.A., Weitz, J.S., and Dushoff, J. 2012. Functional biogeography of ocean microbes revealed through non-negative matrix factorization. PLoS One7, e43866.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Johnson, H.R., Trinidad, D.D., Guzman, S., Khan, Z., Parziale, J.V., DeBruyn, J.M., and Lents, N.H. 2016. A machine learning approach for using the postmortem skin microbiome to estimate the postmortem interval. PLoS One11, e0167370.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kamada, N., Seo, S.U., Chen, G.Y., and Núñez, G. 2013. Role of the gut microbiota in immunity and inflammatory disease. Nat. Rev. Immunol.13, 321–335.

    Article  CAS  PubMed  Google Scholar 

  • Kang, D.W., Adams, J.B., Coleman, D.M., Pollard, E.L., Maldonado, J., McDonough-Means, S., Caporaso, J.G., and Krajmalnik-Brown, R. 2019. Long-term benefit of microbiota transfer therapy on autism symptoms and gut microbiota. Sci. Rep.9, 5821.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kenward, M.G. and Carpenter, J. 2007. Multiple imputation: current perspectives. Stat. Methods Med. Res.16, 199–218.

    Article  PubMed  Google Scholar 

  • Kiryo, R., Niu, G., du Plessis, M.C., and Sugyama, M. 2017. Positiveunlabeled learning with non-negative risk estimator. Adv. Neural Inf. Process. Syst. 1675–1685.

  • Le Chatelier, E., Nielsen, T., Qin, J., Prifti, E., Hildebrand, F., Falony, G., Almeida, M., Arumugam, M., Batto, J.M., Kennedy, S., et al. 2013. Richness of human gut microbiome correlates with metabolic markers. Nature500, 541–546.

    Article  CAS  PubMed  Google Scholar 

  • Lee, D.D. and Seung, H.S. 1999. Learning the parts of objects by non-negative matrix factorization. Nature401, 788–791.

    Article  CAS  PubMed  Google Scholar 

  • Lo, C. and Marculescu, R. 2019. MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics20, 314.

    Article  PubMed  PubMed Central  Google Scholar 

  • Lozupone, C.A., Hamady, M., Kelley, S.T., and Knight, R. 2007. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Appl. Environ. Microbiol.73, 1576–1585.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Meding, S., Nitsche, U., Balluff, B., Elsner, M., Rauser, S., Schöne, C., Nipp, M., Maak, M., Feith, M., Ebert, M.P., et al. 2012. Tumor classification of six common cancer types based on proteomic profiling by MALDI imaging. J. Proteome Res.11, 1996–2003.

    Article  CAS  PubMed  Google Scholar 

  • Namkung, J., Kwon, W., Choi, Y., Yi, S.G., Han, S., Kang, M.J., Kim, S.W., Park, T., and Jang, J.Y. 2015. Molecular subtypes of pancreatic cancer based on MiRNA expression profiles have independent prognostic value. J. Gastroenterol. Hepatol.31, 1160–1167.

    Article  CAS  Google Scholar 

  • Peirce, J.M. and Alviña, K. 2019. The role of inflammation and the gut microbiome in depression and anxiety. J. Neurosci. Res.97, 1223–1241.

    Article  CAS  PubMed  Google Scholar 

  • Pourhoseingholi, M.A., Baghestani, A.R., and Vahedi, M. 2012. How to control confounding effects by statistical analysis. Gastroenterol. Hepatol. Bed Bench5, 79–83.

    PubMed  PubMed Central  Google Scholar 

  • Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F., Liang, S., Zhang, W., Guan, Y., Shen, D., et al. 2012. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature490, 55–60.

    Article  CAS  PubMed  Google Scholar 

  • Rooks, M.G. and Garrett, W.S. 2016. Gut microbiota, metabolites and host immunity. Nat. Rev. Immunol.16, 341–352.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Roy, S. and Trinchieri, G. 2017. Microbiota: a key orchestrator of cancer therapy. Nat. Rev. Cancer17, 271–285.

    Article  CAS  PubMed  Google Scholar 

  • Rubin, D.B. and Schenker, N. 1991. Multiple imputation in health- are databases: an overview and some applications. Stat. Med.10, 585–598.

    Article  CAS  PubMed  Google Scholar 

  • Ryan, F.J. 2019. Application of machine learning techniques for creating urban microbial fingerprints. Biol. Direct14, 13.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol.75, 7537–7541.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schmitt, S., Tsai, P., Bell, J., Fromont, J., Ilan, M., Lindquist, N., Perez, T., Rodrigo, A., Schupp, P.J., Vacelet, J., et al. 2012. Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges. ISME J.6, 564–576.

    Article  CAS  PubMed  Google Scholar 

  • Seligson, D.B., Horvath, S., Shi, T., Yu, H., Tze, S., Grunstein, M., and Kurdistani, S.K. 2005. Global histone modification patterns predict risk of prostate cancer recurrence. Nature435, 1262–1266.

    Article  CAS  PubMed  Google Scholar 

  • Sharon, G., Cruz, N.J., Kang, D.W., Gandal, M.J., Wang, B., Kim, Y.M., Zink, E.M., Casey, C.P., Taylor, B.C., Lane, C.J., et al. 2019. Human gut microbiota from autism spectrum disorder promote behavioral symptoms in mice. Cell177, 1600–1618.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shen, J., Obin, M.S., and Zhao, L. 2013. The gut microbiota, obesity and insulin resistance. Mol. Aspects Med.34, 39–58.

    Article  CAS  PubMed  Google Scholar 

  • Song, E.J., Lee, E.S., and Nam, Y.D. 2018. Progress of analytical tools and techniques for human gut microbiome research. J. Microbiol.56, 693–705.

    Article  PubMed  Google Scholar 

  • Thompson, J., Johansen, R., Dunbar, J., and Munsky, B. 2019. Machine learning to predict microbial community functions: an analysis of dissolved organic carbon from litter decomposition. PLoS One14, e0215502.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tibshirani, R. 1996. Regression selection and shrinkage via the lasso. J. R. Stat. Soc. B.58, 267–288.

    Google Scholar 

  • Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., and Gordon, J.I. 2006. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature444, 1027–1031.

    Article  PubMed  Google Scholar 

  • Valles-Colomer, M., Falony, G., Darzi, Y., Tigchelaar, E.F., Wang, J., Tito, R.Y., Schiweck, C., Kurilshikov, A., Joossens, M., Wijmenga, C., et al. 2019. The neuroactive potential of the human gut microbiota in quality of life and depression. Nat. Microbiol.4, 623–632.

    Article  CAS  PubMed  Google Scholar 

  • Van Der Maaten and Hinton, G. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res.9, 2579–2605.

    Google Scholar 

  • Visconti, A., Le Roy, C.I., Rosa, F., Rossi, N., Martin, T.C., Mohney, R.P., Li, W., de Rinaldis, E., Bell, J.T., Venter, J.C., et al. 2019. Interplay between the human gut microbiome and host metabolism. Nat. Commun.10, 4505.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Wakita, Y., Shimomura, Y., Kitada, Y., Yamamoto, H., Ohashi, Y., and Matsumoto, M. 2018. Taxonomic classification for microbiome analysis, which correlates well with the metabolite milieu of the gut. BMC Microbiol.18, 188.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Weiss, S., Xu, Z.Z., Peddada, S., Amir, A., Bittinger, K., Gonzalez, A., Lozupone, C., Zaneveld, J.R., Vázquez-Baeza, Y., Birmingham, A., et al. 2017. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome5, 27.

    Article  PubMed  PubMed Central  Google Scholar 

  • Xu, M., Xu, X., Li, J., and Li, F. 2019. Association between gut microbiota and autism spectrum disorder: a systematic review and meta-analysis. Front. Psychiatry10, 473.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yang, P., Li, X., Chua, H.N., Kwoh, C.K., and Ng S.K. 2014. Ensemble positive unlabeled learning for disease gene identification. PLoS One9, e97079.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Yang, P., Li, X.L., Mei, J.P., Kwoh, C.K., and Ng, S.K. 2012. Positive-unlabeled learning for disease gene identification. Bioinformatics28, 2640–2647.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zeller, G., Tap, J., Voigt, A.Y., Sunagawa, S., Kultima, J.R., Costea, P.I., Amiot, A., Böhm, J., Brunetti, F., Habermann, N., et al. 2014. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol.10, 766.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Zhang, X., Shen, D., Fang, Z., Jie, Z., Qiu, X., Zhang, C., Chen, Y., and Ji, L. 2013. Human gut microbiota changes reveal the progression of glucose intolerance. PLoS One8, e71108.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zimmermann, M., Zimmermann-Kogadeeva, M., Wegmann, R., and Goodman, A.L. 2019. Separating host and microbiome contributions to drug pharmacokinetics and toxicity. Science363, eaat9931.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

I thank Dr. Rhee and Dr. Chung for the discussions and thanks to anonymous reviewers for giving their valuable comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Junghyun Namkung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Namkung, J. Machine learning methods for microbiome studies. J Microbiol. 58, 206–216 (2020). https://doi.org/10.1007/s12275-020-0066-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12275-020-0066-8

Keywords

Navigation