Learn R Programming

DECIPHER (version 2.0.2)

DB2Seqs: Export Database Sequences to a FASTA or FASTQ File

Description

Exports a database containing sequences to a FASTA or FASTQ formatted file of sequence records.

Usage

DB2Seqs(file, dbFile, tblName = "Seqs", identifier = "", type = "BStringSet", limit = -1, replaceChar = "-", nameBy = "description", orderBy = "row_names", removeGaps = "none", append = FALSE, width = 80, compress = FALSE, chunkSize = 1e5, sep = "::", clause = "", verbose = TRUE)

Arguments

file
Character string giving the location where the file should be written.
dbFile
A SQLite connection object or a character string specifying the path to the database file.
tblName
Character string specifying the table in which to extract the data.
identifier
Optional character string used to narrow the search results to those matching a specific identifier. If "" then all identifiers are selected.
type
The type of XStringSet (sequences) to export to a FASTA formatted file or QualityScaledXStringSet to export to a FASTQ formatted file. This should be (an unambiguous abbreviation of) one of "DNAStringSet", "RNAStringSet", "AAStringSet", "BStringSet", "QualityScaledDNAStringSet", "QualityScaledRNAStringSet", "QualityScaledAAStringSet", or "QualityScaledBStringSet". (See details section below.)
limit
Number of results to display. The default (-1) does not limit the number of results.
replaceChar
Optional character used to replace any characters of the sequence that are not present in the XStringSet's alphabet. Not applicable if type=="BStringSet". (See details section below.)
nameBy
Character string giving the column name(s) for identifying each sequence record. If more than one column name is provided, the information in each column is concatenated, separated by sep, in the order specified.
orderBy
Character string giving the column name for sorting the results. Defaults to the order of entries in the database. Optionally can be followed by " ASC" or " DESC" to specify ascending (the default) or descending order.
removeGaps
Determines how gaps ("-" or "." characters) are removed in the sequences. This should be (an unambiguous abbreviation of) one of "none", "all" or "common".
append
Logical indicating whether to append the output to the existing file.
width
Integer specifying the maximum number of characters per line of sequence. Not applicable when exporting to a FASTQ formatted file.
compress
Logical specifying whether to compress the output file using gzip compression.
chunkSize
Number of sequences to write to the file at a time. Cannot be less than the total number of sequences if removeGaps is "common".
sep
Character string providing the separator between fields in each sequence's name, by default pairs of colons (``::'').
clause
An optional character string to append to the query as part of a ``where clause''.
verbose
Logical indicating whether to display status.

Value

Writes a FASTA or FASTQ formatted file containing the sequence records in the database.Returns the number of sequence records written to the file.

Details

Sequences are exported into either a FASTA or FASTQ file as determined by the type of sequences. If type is an XStringSet then sequences are exported to FASTA format. Quality information for QualityScaledXStringSets are interpreted as PredQuality scores before export to FASTQ format.

If type is "BStringSet" (the default) then sequences are exported to a FASTA file exactly the same as they were when imported. If type is "DNAStringSet" then all U's are converted to T's before export, and vise-versa if type is "RNAStringSet". All remaining characters not in the XStringSet's alphabet are converted to replaceChar.

Examples

Run this code
db <- system.file("extdata", "Bacteria_175seqs.sqlite", package="DECIPHER")
tf <- tempfile()
DB2Seqs(tf, db, limit=10)
file.show(tf) # press 'q' to exit
unlink(tf)

Run the code above in your browser using DataLab