Learn R Programming

taxonomizr (version 0.8.0)

taxonomizrSwitch: Switch from data.table to SQLite

Description

In version 0.5.0, taxonomizr switched from data.table to SQLite name and node lookups. See below for more details.

Arguments

Details

Version 0.5.0 marked a change for name and node lookups from using data.table to using SQLite. This was necessary to increase performance (10-100x speedup for getTaxonomy) and create a simpler interface (a single SQLite database contains all necessary data). Unfortunately, this switch requires a couple breaking changes:

  • getTaxonomy changes from getTaxonomy(ids,namesDT,nodesDT) to getTaxonomy(ids,sqlFile)

  • getId changes from getId(taxa,namesDT) to getId(taxa,sqlFile)

  • read.names is deprecated, instead use read.names.sql. For example, instead of calling names<-read.names('names.dmp') in every session, simply call read.names.sql('names.dmp','accessionTaxa.sql') once (or use the convenient prepareDatabase)).

  • read.nodes is deprecated, instead use read.names.sql. For example. instead of calling nodes<-read.names('nodes.dmp') in every session, simply call read.nodes.sql('nodes.dmp','accessionTaxa.sql') once (or use the convenient prepareDatabase).

I've tried to ease any problems with this by overloading getTaxonomy and getId to still function (with a warning) if passed a data.table names and nodes argument and providing a simpler prepareDatabase function for completing all setup steps (hopefully avoiding direct calls to read.names and read.nodes for most users).

I plan to eventually remove data.table functionality to avoid a split codebase so please switch to the new SQLite format in all new code.

See Also

getTaxonomy, read.names.sql, read.nodes.sql, prepareDatabase, getId