Published May 5, 2023 | Version 2023-05-03
Dataset Open

COInr a comprehensive, non-redundant COI database from NCBI-nt and BOLD

  • 1. IMBE, Aix-Marseille University

Contributors

Contact person:

  • 1. IMBE, Aix-Marseille University

Description

COInr is a non-redundant, comprehensive database of COI sequences extracted from NCBI-nt and BOLD. It is not limited to a taxon, a gene region, or a taxonomic resolution. Sequences are dereplicated between databases and within taxa.

Each taxon has a unique taxonomic Identifier (taxID), fundamental to avoid ambiguous associations of homonyms and synonyms in the source database. TaxIDs form a coherent hierarchical system fully compatible with the NCBI taxIDs allowing creating their full or ranked linages.
 
COInr is a good starting point to create custom databases according to the users’ needs using mkCOInr scripts available at https://github.com/meglecz/mkCOInr  
It is possible to select/eliminate sequences for a list of taxa, select a specific gene region, select for minimum taxonomic resolution, add new custom sequences, and format the database for BLAST, QIIME, RDP classifiers.

 

Files

Files (469.7 MB)

Name Size Download all
md5:c6784fdcd6cf3c8b48838f346521d1a8
227.8 MB Download
md5:56853817a3b9a5dd376b6adf39708d45
241.9 MB Download

Additional details

Related works

Is cited by
Software: https://github.com/meglecz/mkCOInr (URL)
Is published in
Preprint: 10.1101/2022.05.18.492423 (DOI)
Journal article: 10.1111/1755-0998.13756 (DOI)