Optimizing taxonomic classification of marker gene sequences
- Published
- Accepted
- Subject Areas
- Bioinformatics, Microbiology, Taxonomy
- Keywords
- microbiome, marker-gene sequencing, taxonomy, sequence classification
- Copyright
- © 2017 Bokulich et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2017. Optimizing taxonomic classification of marker gene sequences. PeerJ Preprints 5:e3208v1 https://doi.org/10.7287/peerj.preprints.3208v1
Abstract
Background. Taxonomic classification of marker-gene sequences is an important step in microbiome analysis.
Results. We present q2-feature-classifier (https://github.com/qiime2/q2-feature-classifier), a QIIME 2 plugin containing several novel machine-learning and alignment-based taxonomy classifiers that meet or exceed classification accuracy of existing methods. We evaluated and optimized several commonly used taxonomic classification methods (RDP, BLAST, BLAST+, UCLUST) and several new methods (a scikit-learn naive Bayes machine-learning classifier, and VSEARCH and SortMeRNA alignment-based methods).
Conclusions. Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make explicit recommendations regarding parameter choices for a range of standard operating conditions. q2-feature-classifier and our evaluation framework, tax-credit, are both free, open-source, BSD-licensed packages available on GitHub.
Author Comment
This pre-print describes the development, optimization, and benchmarking of marker-gene taxonomy classification methods. This manuscript has been submitted to a peer-reviewed journal.