Learn R Programming

DECIPHER (version 2.0.2)

Seqs2DB: Add Sequences from Text File to Database

Description

Adds sequences to a database.

Usage

Seqs2DB(seqs, type, dbFile, identifier, tblName = "Seqs", chunkSize = 1e7, replaceTbl = FALSE, fields = c(accession = "ACCESSION", rank = "ORGANISM"), processors = 1, verbose = TRUE, ...)

Arguments

seqs
A connection object or a character string specifying the file path to the file containing the sequences, an XStringSet object if type is XStringSet, or a QualityScaledXStringSet object if type is QualityScaledXStringSet. Files compressed with gzip, bzip2, xz, or lzma compression are automatically detected and decompressed during import. Full URL paths (e.g., "http://" or "ftp://") to uncompressed text files or gzip compressed text files can also be used.
type
The type of the sequences (seqs) being imported. This should be (an unambiguous abbreviation of) one of "FASTA", "FASTQ", "GenBank", "XStringSet", or "QualityScaledXStringSet".
dbFile
A SQLite connection object or a character string specifying the path to the database file. If the dbFile does not exist then a new database is created at this location.
identifier
Character string specifying the "id" to give the imported sequences in the database.
tblName
Character string specifying the table in which to add the sequences.
chunkSize
Number of characters to read at a time.
replaceTbl
Logical. If FALSE (the default) then the sequences are appended to any already existing in the table. If TRUE then any sequences already in the table are overwritten.
fields
Named character vector providing the fields to import from a "GenBank" formatted file as text columns in the database (not applicable for other "type"s). The default is to import the "ACCESSION" field as a column named "accession" and the "ORGANISM" field as a column named "rank". Other uppercase fields, such as "LOCUS" or "VERSION", can be specified in similar manner. Note that the "DEFINITION" field is automatically imported as a column named "description" in the database.
processors
The number of processors to use, or NULL to automatically detect and use all available processors.
verbose
Logical indicating whether to display each query as it is sent to the database.
...
Further arguments to be passed directly to Codec.

Value

The total number of sequences in the database table is returned after import.

Details

Sequences are imported into the database in chunks of lines specified by chunkSize. The sequences can then be identified by searching the database for the identifier provided. Sequences are added to the database verbatim, so that no sequence information is lost when the sequences are exported from the database. The sequence (record) names are recorded into a column named ``description'' in the database.

See Also

BrowseDB, SearchDB, DB2Seqs

Examples

Run this code
gen <- system.file("extdata", "Bacteria_175seqs.gen", package="DECIPHER")
dbConn <- dbConnect(SQLite(), ":memory:")
Seqs2DB(gen, "GenBank", dbConn, "Bacteria")
BrowseDB(dbConn)
dna <- SearchDB(dbConn, nameBy="description")
dbDisconnect(dbConn)

Run the code above in your browser using DataLab