Canadian Forest Service Publications

Over 2.5 million COI sequences in GenBank and growing. 2018. Porter, T.M.; Hajibabaei, M. PLoS ONE 13(9): e0200177.

Year: 2018

Issued by: Great Lakes Forestry Centre

Catalog ID: 39385

Language: English

Availability: PDF (download)

Available from the Journal's Web site.
DOI: 10.1371/journal.pone.0200177

† This site may require a fee

Mark record

Plain Language Summary

The increasing popularity of cytochrome c oxidase subunit 1 (COI) DNA metabarcoding warrants a careful look at the underlying reference databases used to make high-throughput taxonomic assignments. The objectives of this study are to document trends and assess the future usability of COI records for metabarcode identification. The number of COI records deposited to the NCBI nucleotide database has increased by a geometric average of 51% per year, from 8,137 records deposited in 2003 to a cumulative total of ~ 2.5 million by the end of 2017. To ensure the future usability of COI records in GenBank we suggest: 1) Improving the geographic representation of COI records, 2) Improving the cross-referencing of COI records in the Barcode of Life Data System and GenBank to facilitate consolidation and incorporation into existing bioinformatic pipelines, 3) Adherence to the minimum information about a marker gene sequence guidelines, and 4) Integrating metabarcodes from eDNA and mixed community studies with existing reference sequences. The growth of COI reference records over the past 15 years has been substantial and is likely to be a resource across many fields for years to come.