Abstract
Background The availability of hundreds of city microbiome profiles allows the development of increasingly accurate predictors of the origin of a sample based on its microbiota composition. Typical microbiome studies involve the analysis of bacterial abundance profiles.
Results Here we use a transformation of the conventional bacterial strain or gene abundance profiles to functional profiles that account for bacterial metabolism and other cell functionalities. These profiles are used as features for city classification in a machine learning algorithm that allows the extraction of the most relevant features for the classification.
Conclusions We demonstrate here that the use of functional profiles not only predict accurately the most likely origin of a sample but also to provide an interesting functional point of view of the biogeography of the microbiota. Interestingly, we show how cities can be classified based on the observed profile of antibiotic resistances.
List of abbreviations
- CAMDA
- Critical Assessment of Massive Data Analysis
- CARD
- Comprehensive Antibiotic Resistance Database
- CCA
- Canonical Correlation Analysis
- HLA
- Human Leukocyte Antigen
- KEGG
- Kyoto Encyclopedia of Genes and Genomes
- PCA
- Principal Component Analysis
- SNP
- Single Nucleotide Polymorphisms
- t-SNE
- t-distributed Stochastic Neighbor Embedding
- WGS
- whole genome sequencing