COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD
Description
COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.
Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.
COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.
Files
Files
(469.7 MB)
Name | Size | Download all |
---|---|---|
md5:c6784fdcd6cf3c8b48838f346521d1a8
|
227.8 MB | Download |
md5:56853817a3b9a5dd376b6adf39708d45
|
241.9 MB | Download |
Additional details
Related works
- Is cited by
- Software: https://github.com/meglecz/mkCOInr (URL)
- Is published in
- Preprint: 10.1101/2022.05.18.492423 (DOI)
- Journal article: 10.1111/1755-0998.13756 (DOI)