Learn R Programming

DECIPHER (version 2.0.2)

FormGroups: Forms Groups By Rank

Description

Agglomerates sequences into groups within a certain size range based on taxonomic rank.

Usage

FormGroups(dbFile, tblName = "Seqs", goalSize = 1000, minGroupSize = 500, maxGroupSize = 10000, add2tbl = FALSE, verbose = TRUE)

Arguments

dbFile
A SQLite connection object or a character string specifying the path to the database file.
tblName
Character string specifying the table where the rank information is located.
goalSize
Number of sequences required in each group to stop adding more sequences.
minGroupSize
Minimum number of sequences in each group required to stop trying to recombine with a larger group.
maxGroupSize
Maximum number of sequences in each group allowed to continue agglomeration.
add2tbl
Logical or a character string specifying the table name in which to add the result.
verbose
Logical indicating whether to print database queries and other information.

Value

A data.frame with the rank and corresponding identifier as identifier. The origin gives the rank preceding the identifier. If add2tbl is not FALSE then the ``identifier'' and ``origin'' columns are updated in dbFile.

Details

FormGroups uses the ``rank'' field in the dbFile table to group sequences with similar taxonomic rank. Rank information must be present in the tblName, such as that created by default when importing sequences from a GenBank formatted file. The rank information must not contain repeated taxonomic names belonging to different lineages.

Beginning with the least common ranks, the algorithm agglomerates groups with similar ranks until the goalSize is reached. If the group size is below minGroupSize then further agglomeration is attempted with a larger group. If additional agglomeration results in a group larger than maxGroupSize then the agglomeration is undone so that the group is smaller.

See Also

IdentifyByRank

Examples

Run this code
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
g <- FormGroups(db, goalSize=10, minGroupSize=5, maxGroupSize=20)
head(g)

Run the code above in your browser using DataLab