contax.trim: The ConTax data set

Description

The trimmed version of the ConTax data set.

Usage

data(contax.trim)

Arguments

Details

contax.trim is a data.frame object containing 38 781 full-length 16S rRNA sequences. It is the trimmed version of the full data set (see below). Large taxa (many sequences) have been trimmed as described in Vinje et al. (2016) to obtain a data set with a more even representation of the prokaryotic taxonomy.

The contax.full is the full consensus taxonomy data set as described in Vinje et al. (2016). The data set is too large for CRAN and thus available as a separate package microcontax.data. See example below for how to obtain contax.full.

The Header of every sequence starts with a unique tag, in this case the text "ConTax" and some integer. This is followed by a token describing the origin of the sequence. It is typically

"Intersection=SRG"

meaning it is found in both the Silva, RDP and Greengenes data repository. Intersections can also be SR, SG and RG if the sequence was found in two repositories only. The taxonomy information for each sequence is found in the third token. It follows a commonly used format:

"k__<...>;p__<...>;c__<...>;o__<...>;f__<...>;g__<...>;"

where <...> is some proper text. The letters, followed by a double underscore, refer to the taxonomic levels Domain (Kingdom), Phylum, Class, Order, Family and Genus. Here is an example of a proper string:

"k__Bacteria;p__Firmicutes;c__Bacilli;o__Bacillales;f__Staphylococcaceae;g__Staphylococcus;"

As long as this format is used the taxonomy information can be extracted by the supplied extractor-functions getDomain, getPhylum,...,getGenus.

Examples

Run this code

# NOT RUN {
data(contax.trim)
dim(contax.trim)

# Write to FASTA-file
# }
# NOT RUN {
writeFasta(contax.trim,out.file="ConTax_trim.fasta")

# Install microcontax.data with the BIG contax.full data set
if (!requireNamespace("microcontax.data", quietly = TRUE)) {
 install.packages("microcontax.data")
}
# Load data
data("contax.full", package = "microcontax.data")
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples