Show simple item record

dc.contributor.authorNasko, Daniel J
Koren, Sergey
Phillippy, Adam M
Treangen, Todd J
dc.date.accessioned 2018-11-28T16:43:59Z
dc.date.available 2018-11-28T16:43:59Z
dc.date.issued 2018-10-30
dc.identifier.citation Nasko, Daniel J, Koren, Sergey, Phillippy, Adam M, et al.. "RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification." (2018) BioMed Central: https://doi.org/10.1186/s13059-018-1554-6.
dc.identifier.urihttps://hdl.handle.net/1911/103430
dc.description.abstract Abstract In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.
dc.publisher BioMed Central
dc.title RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification
dc.date.updated 2018-11-28T16:43:59Z
dc.identifier.doihttps://doi.org/10.1186/s13059-018-1554-6
dc.language.rfc3066 en
dcterms.bibliographicCitation Genome Biology. 2018 Oct 30;19(1):165
dc.rights.holder The Author(s).


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record