Learn R Programming

DECIPHER (version 2.0.2)

IdLengths: Determine the Number of Bases, Nonbases, and Width of Each Sequence

Description

Counts the number of bases (A, C, G, T) and ambiguities/degeneracies in each sequence.

Usage

IdLengths(dbFile, tblName = "Seqs", identifier = "", type = "DNAStringSet", add2tbl = FALSE, batchSize = 10000, processors = 1, verbose = TRUE)

Arguments

dbFile
A SQLite connection object or a character string specifying the path to the database file.
tblName
Character string specifying the table where the sequences are located.
identifier
Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected.
type
The type of XStringSet being processed. This should be (an abbreviation of) one of "DNAStringSet" or "RNAStringSet".
add2tbl
Logical or a character string specifying the table name in which to add the result.
batchSize
Integer specifying the number of sequences to process at a time.
processors
The number of processors to use, or NULL to automatically detect and use all available processors.
verbose
Logical indicating whether to display progress.

Value

A data.frame with the number of bases (``A'', ``C'', ``G'', or ``T''), nonbases, and width of each sequence. The width is defined as the sum of bases and nonbases in each sequence. The row.names of the data.frame correspond to the "row_names" in the tblName of the dbFile.

See Also

Add2DB

Examples

Run this code
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
l <- IdLengths(db)
head(l)

Run the code above in your browser using DataLab