IdLengths: Determine the Number of Bases, Nonbases, and Width of Each Sequence

Description

Counts the number of bases (A, C, G, T) and ambiguities/degeneracies in each sequence.

Usage

IdLengths(dbFile, tblName = "Seqs", identifier = "", type = "DNAStringSet", add2tbl = FALSE, batchSize = 10000, processors = 1, verbose = TRUE)

Arguments

dbFile

A SQLite connection object or a character string specifying the path to the database file.

tblName

Character string specifying the table where the sequences are located.

identifier

Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected.

type

The type of XStringSet being processed. This should be (an abbreviation of) one of "DNAStringSet" or "RNAStringSet".

add2tbl

Logical or a character string specifying the table name in which to add the result.

batchSize

Integer specifying the number of sequences to process at a time.

processors

The number of processors to use, or NULL to automatically detect and use all available processors.

verbose

Logical indicating whether to display progress.

Value

A data.frame with the number of bases (``A'', ``C'', ``G'', or ``T''), nonbases, and width of each sequence. The width is defined as the sum of bases and nonbases in each sequence. The row.names of the data.frame correspond to the "row_names" in the tblName of the dbFile.

Examples

Run this code

db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
l <- IdLengths(db)
head(l)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples