Learn R Programming

⚠️There's a newer version (1.2.2) of this package.Take me there.

taxonbridge

Biological taxonomies establish conventions by which researchers can catalogue and systematically compare their work using nomenclature such as numeric identifiers and binomial names. The ideal taxonomy is unambiguous and exhaustive; however, no perfect taxonomy exists. The degree to which a taxonomy is useful to a researcher depends on context provided by, for example, the taxonomic neighborhood of a species or the geological timeframe of the study. Collating the most relevant taxonomic information from multiple taxonomies is hampered by arbitrary assignment of numeric identifiers by database administrators, ambiguity in scientific names, and duplication. The NCBI is the go-to resource for many scientists, but its taxonomy only includes data on species with sequence data. In contrast, the Global Biodiversity Information Facility (GBIF) backbone taxonomy references a more extensive list of extinct and extant species, and it is integrated with 100 other taxonomic databases. Unfortunately, the GBIF backbone taxonomy excludes the NCBI taxonomy. Since the NCBI and GBIF use different numeric identifiers, it is easy to imagine how using scientific names could lead to errors when mapping from one taxonomy to the other. As a case in point, additional lineage information could be used to validate mapping by recursively comparing parental taxon names. The goal of taxonbridge is hence to provide a set of tools for merging the GBIF backbone and NCBI taxonomies in order to derive a consistent, deduplicated and disambiguated custom taxonomy for any given study (see data provenance).

Installation

CRAN version:

To install taxonbridge from CRAN type:

install.packages("taxonbridge")
library(taxonbridge)

Note that the version on CRAN might not reflect the most recent changes made to the development version of taxonbridge.

Development version:

You can install the development version of taxonbridge with devtools:

install.packages(c("devtools", "rmarkdown"))
devtools::install_github("MoultDB/taxonbridge", build_vignettes = TRUE)
library(taxonbridge)

taxonbridge can be also be updated/re-installed/overwritten with either of the preceding installation options.

Available methods and how to use them

See the taxonbridge documentation and workflow.

Examples

This is a basic example which uses a function from each of the taxonbridge package's four main function categories to load and manipulate sample data:

library(taxonbridge)
plot_mdb(prepare_comparable_rank_dist(get_validity(get_status(load_sample()), valid = TRUE)))

Want to try more than a sample? Download a larger dataset and load it as follow:

library(taxonbridge)
load_population("path/to/downloaded/dataset")

You can also prepare a dataset yourself which requires the use of external data and software available at the following links:

Global Biodiversity Information Facility (GBIF) backbone taxonomy (use download_gbif() and note the location of the file Taxon.tsv).

National Centre for Biotechnology Information (NCBI) taxonomy (use download_ncbi() and parse the downloaded files with Taxonkit according to its guidelines, or use download_ncbi(taxonkitpath = "/path/to/taxonkit") to carry out parsing automatically if Taxonkit is installed on your system):

library(taxonbridge)
custom_taxonomy <- load_taxonomies(download_gbif(), download_ncbi(taxonkitpath = "/path/to/taxonkit"))

Read the load_taxonomies() function documentation for instructions on how to load a dataset of your own.

See the workflow and vignette for more ideas on what to do with loaded data in taxonbridge.

Copy Link

Version

Install

install.packages('taxonbridge')

Monthly Downloads

101

Version

1.0.5

License

CC0

Issues

Pull Requests

Stars

Forks

Maintainer

Werner Veldsman

Last Published

March 24th, 2022

Functions in taxonbridge (1.0.5)

prepare_comparable_rank_dist

Get comparable NCBI and GBIF taxonomic ranks
prepare_rank_dist

Get all NCBI and GBIF taxonomic ranks
load_sample

Load a sample of previously merged GBIF and NCBI taxonomies
load_population

Load previously merged GBIF and NCBI taxonomies
load_taxonomies

Load and merge GBIF and NCBI taxonomic data
plot_mdb

Generic for plot_mdb methods
dedupe

Remove duplicate scientific names in a taxonomy
term_conversion

Convert GBIF terms to NCBI terms
download_ncbi

Download the NCBI taxonomy
get_taxa

A helper function to filter columns on GBIF taxa names
fuzzy_search

Match misspelled or partial scientific names
download_gbif

Download the GBIF backbone taxonomy
annotate

Annotate a custom taxonomy
get_validity

Validate entries of a merged taxonomy
get_inconsistencies

Detect candidate inconsistencies and ambiguity
get_lineages

Get entries that have lineage information for both the GBIF and NCBI data
get_status

Filter a custom taxonomy by GBIF taxonomic status/synonym