Abstract
Researches on the microbiome have been actively conducted worldwide and the results have shown human gut bacterial environment significantly impacts on immune system, psychological conditions, cancers, obesity, and metabolic diseases. Thanks to the development of sequencing technology, microbiome studies with large number of samples are eligible on an acceptable cost nowadays. Large samples allow analysis of more sophisticated modeling using machine learning approaches to study relationships between microbiome and various traits. This article provides an overview of machine learning methods for non-data scientists interested in the association analysis of microbiomes and host phenotypes. Once genomic feature of microbiome is determined, various analysis methods can be used to explore the relationship between microbiome and host phenotypes that include penalized regression, support vector machine (SVM), random forest, and artificial neural network (ANN). Deep neural network methods are also touched. Analysis procedure from environment setup to extract analysis results are presented with Python programming language.
Similar content being viewed by others
References
Asgari, E., Garakani, K., McHardy, A.C., and Mofrad, M.R.K. 2018. MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples. Bioinformatics34, i32–i42.
Ashkani, J. and Naidoo, K. 2016. Glycosyltransferase gene expression profiles classify cancer types and propose prognostic subtypes. Sci. Rep.6, 26451.
Bezawada, N., Phang, T.H., Hold, G.L., and Hansen, R. 2020. Autism spectrum disorder and the gut microbiota in children: a systematic review. Ann. Nutr. Metab. 1–14. DOI: https://doi.org/10.1159/000505363
Bhalla, S., Chaudhary, K., Kumar, R., Sehgal, M., Kaur, H., Sharma, S., and Raghava, G.P. 2017. Gene expression-based biomarkers for discriminating early and late stage of clear cell renal cancer. Sci. Rep.7, 44997.
Bolyen, E., Rideout, J.R., Dillon, M.R., Bokulich, N.A., Abnet, C.C., Al-Ghalith, G.A., Alexander, H., Alm, E.J., Arumugam, M., Asnicar, F., et al. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol.37, 852–857.
Boutin, S., Hagenfeld, D., Zimmermann, H., El Sayed, N., Höpker, T., Greiser, H.K., Becher, H., Kim, T.S., and Dalpke, A.H. 2017. Clustering of subgingival microbiota reveals microbial disease ecotypes associated with clinical stages of periodontitis in a cross-sectional study. Front. Microbiol.8, 340.
Bray, J.R. and Curtis, J.T. 1957. An ordination of upland forest communities of southern Wisconsin. Ecol. Monogr.27, 325–349.
Breiman, L. 2001. Random forests. Mach. Learn.45, 5–32.
Calle, M.L., Urrea, V., Boulesteix, A.L., and Malats, N. 2011. AUC-RF: a new strategy for genomic profiling with random forest. Hum. Hered.72, 121–132.
Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Peña, A.G., Goodrich, J.K., Gordon, J.I., et al. 2010. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods7, 335–336.
Carlson, M.D.A. and Morrison, R.S. 2009. Study design, precision, and validity in observational studies. J. Palliat. Med.12, 77–82.
Cortes, C. and Vapnik, V. 1995. Support-vector networks. Mach. Learn.20, 273–297.
Du Plessis, M.C., Niu, G., and Sugiyama, M. 2014. Analysis of learning from positive and unlabeled data. Adv. Neur. Inf. Process. Syst.1, 703–711.
Duvallet, C., Gibbons, S.M., Gurry, T., Irizarry, R.A., and Alm, E.J. 2017. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat. Commun.8, 1784.
Elkan, C. and Noto, K. 2008. Learning classifiers from only positive and unlabeled data. In Proceedings of the Fourteenth International Conference on Knowledge Discovery and Data Mining (KDD), pp. 213–220.
Everard, A. and Cani, P.D. 2013. Diabetes, obesity and gut microbiota. Best Pract. Res. Clin. Gastroenterol.27, 73–83.
Galkin, F., Aliper, A., Putin, E., Kuznetsov, I., Gladyshev, V.N., and Zhavoronkov, A. 2018. Human microbiome aging clocks based on deep learning and tandem of permutation feature importance and accumulated local effects. BioRxiv 507780.
Gérard, P. 2016. Gut microbiota and obesity. Cell. Mol. Life Sci.73, 147–162.
Grice, E.A. and Segre, J.A. 2012. The human microbiome: our second genome. Annu. Rev. Genomics Hum. Genet.13, 151–170.
Hamady, M. and Knight, R. 2009. Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res.19, 1141–1152.
Hotelling, H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol.24, 417–441.
Jiang, X., Langille, M.G.I., Neches, R.Y., Elliot, M., Levin, S.A., Eisen, J.A., Weitz, J.S., and Dushoff, J. 2012. Functional biogeography of ocean microbes revealed through non-negative matrix factorization. PLoS One7, e43866.
Johnson, H.R., Trinidad, D.D., Guzman, S., Khan, Z., Parziale, J.V., DeBruyn, J.M., and Lents, N.H. 2016. A machine learning approach for using the postmortem skin microbiome to estimate the postmortem interval. PLoS One11, e0167370.
Kamada, N., Seo, S.U., Chen, G.Y., and Núñez, G. 2013. Role of the gut microbiota in immunity and inflammatory disease. Nat. Rev. Immunol.13, 321–335.
Kang, D.W., Adams, J.B., Coleman, D.M., Pollard, E.L., Maldonado, J., McDonough-Means, S., Caporaso, J.G., and Krajmalnik-Brown, R. 2019. Long-term benefit of microbiota transfer therapy on autism symptoms and gut microbiota. Sci. Rep.9, 5821.
Kenward, M.G. and Carpenter, J. 2007. Multiple imputation: current perspectives. Stat. Methods Med. Res.16, 199–218.
Kiryo, R., Niu, G., du Plessis, M.C., and Sugyama, M. 2017. Positiveunlabeled learning with non-negative risk estimator. Adv. Neural Inf. Process. Syst. 1675–1685.
Le Chatelier, E., Nielsen, T., Qin, J., Prifti, E., Hildebrand, F., Falony, G., Almeida, M., Arumugam, M., Batto, J.M., Kennedy, S., et al. 2013. Richness of human gut microbiome correlates with metabolic markers. Nature500, 541–546.
Lee, D.D. and Seung, H.S. 1999. Learning the parts of objects by non-negative matrix factorization. Nature401, 788–791.
Lo, C. and Marculescu, R. 2019. MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks. BMC Bioinformatics20, 314.
Lozupone, C.A., Hamady, M., Kelley, S.T., and Knight, R. 2007. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Appl. Environ. Microbiol.73, 1576–1585.
Meding, S., Nitsche, U., Balluff, B., Elsner, M., Rauser, S., Schöne, C., Nipp, M., Maak, M., Feith, M., Ebert, M.P., et al. 2012. Tumor classification of six common cancer types based on proteomic profiling by MALDI imaging. J. Proteome Res.11, 1996–2003.
Namkung, J., Kwon, W., Choi, Y., Yi, S.G., Han, S., Kang, M.J., Kim, S.W., Park, T., and Jang, J.Y. 2015. Molecular subtypes of pancreatic cancer based on MiRNA expression profiles have independent prognostic value. J. Gastroenterol. Hepatol.31, 1160–1167.
Peirce, J.M. and Alviña, K. 2019. The role of inflammation and the gut microbiome in depression and anxiety. J. Neurosci. Res.97, 1223–1241.
Pourhoseingholi, M.A., Baghestani, A.R., and Vahedi, M. 2012. How to control confounding effects by statistical analysis. Gastroenterol. Hepatol. Bed Bench5, 79–83.
Qin, J., Li, Y., Cai, Z., Li, S., Zhu, J., Zhang, F., Liang, S., Zhang, W., Guan, Y., Shen, D., et al. 2012. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature490, 55–60.
Rooks, M.G. and Garrett, W.S. 2016. Gut microbiota, metabolites and host immunity. Nat. Rev. Immunol.16, 341–352.
Roy, S. and Trinchieri, G. 2017. Microbiota: a key orchestrator of cancer therapy. Nat. Rev. Cancer17, 271–285.
Rubin, D.B. and Schenker, N. 1991. Multiple imputation in health- are databases: an overview and some applications. Stat. Med.10, 585–598.
Ryan, F.J. 2019. Application of machine learning techniques for creating urban microbial fingerprints. Biol. Direct14, 13.
Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., et al. 2009. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol.75, 7537–7541.
Schmitt, S., Tsai, P., Bell, J., Fromont, J., Ilan, M., Lindquist, N., Perez, T., Rodrigo, A., Schupp, P.J., Vacelet, J., et al. 2012. Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges. ISME J.6, 564–576.
Seligson, D.B., Horvath, S., Shi, T., Yu, H., Tze, S., Grunstein, M., and Kurdistani, S.K. 2005. Global histone modification patterns predict risk of prostate cancer recurrence. Nature435, 1262–1266.
Sharon, G., Cruz, N.J., Kang, D.W., Gandal, M.J., Wang, B., Kim, Y.M., Zink, E.M., Casey, C.P., Taylor, B.C., Lane, C.J., et al. 2019. Human gut microbiota from autism spectrum disorder promote behavioral symptoms in mice. Cell177, 1600–1618.
Shen, J., Obin, M.S., and Zhao, L. 2013. The gut microbiota, obesity and insulin resistance. Mol. Aspects Med.34, 39–58.
Song, E.J., Lee, E.S., and Nam, Y.D. 2018. Progress of analytical tools and techniques for human gut microbiome research. J. Microbiol.56, 693–705.
Thompson, J., Johansen, R., Dunbar, J., and Munsky, B. 2019. Machine learning to predict microbial community functions: an analysis of dissolved organic carbon from litter decomposition. PLoS One14, e0215502.
Tibshirani, R. 1996. Regression selection and shrinkage via the lasso. J. R. Stat. Soc. B.58, 267–288.
Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., and Gordon, J.I. 2006. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature444, 1027–1031.
Valles-Colomer, M., Falony, G., Darzi, Y., Tigchelaar, E.F., Wang, J., Tito, R.Y., Schiweck, C., Kurilshikov, A., Joossens, M., Wijmenga, C., et al. 2019. The neuroactive potential of the human gut microbiota in quality of life and depression. Nat. Microbiol.4, 623–632.
Van Der Maaten and Hinton, G. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res.9, 2579–2605.
Visconti, A., Le Roy, C.I., Rosa, F., Rossi, N., Martin, T.C., Mohney, R.P., Li, W., de Rinaldis, E., Bell, J.T., Venter, J.C., et al. 2019. Interplay between the human gut microbiome and host metabolism. Nat. Commun.10, 4505.
Wakita, Y., Shimomura, Y., Kitada, Y., Yamamoto, H., Ohashi, Y., and Matsumoto, M. 2018. Taxonomic classification for microbiome analysis, which correlates well with the metabolite milieu of the gut. BMC Microbiol.18, 188.
Weiss, S., Xu, Z.Z., Peddada, S., Amir, A., Bittinger, K., Gonzalez, A., Lozupone, C., Zaneveld, J.R., Vázquez-Baeza, Y., Birmingham, A., et al. 2017. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome5, 27.
Xu, M., Xu, X., Li, J., and Li, F. 2019. Association between gut microbiota and autism spectrum disorder: a systematic review and meta-analysis. Front. Psychiatry10, 473.
Yang, P., Li, X., Chua, H.N., Kwoh, C.K., and Ng S.K. 2014. Ensemble positive unlabeled learning for disease gene identification. PLoS One9, e97079.
Yang, P., Li, X.L., Mei, J.P., Kwoh, C.K., and Ng, S.K. 2012. Positive-unlabeled learning for disease gene identification. Bioinformatics28, 2640–2647.
Zeller, G., Tap, J., Voigt, A.Y., Sunagawa, S., Kultima, J.R., Costea, P.I., Amiot, A., Böhm, J., Brunetti, F., Habermann, N., et al. 2014. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol.10, 766.
Zhang, X., Shen, D., Fang, Z., Jie, Z., Qiu, X., Zhang, C., Chen, Y., and Ji, L. 2013. Human gut microbiota changes reveal the progression of glucose intolerance. PLoS One8, e71108.
Zimmermann, M., Zimmermann-Kogadeeva, M., Wegmann, R., and Goodman, A.L. 2019. Separating host and microbiome contributions to drug pharmacokinetics and toxicity. Science363, eaat9931.
Acknowledgments
I thank Dr. Rhee and Dr. Chung for the discussions and thanks to anonymous reviewers for giving their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Namkung, J. Machine learning methods for microbiome studies. J Microbiol. 58, 206–216 (2020). https://doi.org/10.1007/s12275-020-0066-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12275-020-0066-8