readFastq reads all FASTQ-formated files in a directory
dirPath whose file name matches pattern pattern,
returning a compact internal representation of the sequences and
quality scores in the files. Methods read all files into a single R
object; a typical use is to restrict input to a single FASTQ file.
writeFastq writes an object to a single file, using
mode="w" (the default) to create a new file or mode="a"
append to an existing file. Attempting to write to an existing file
with mode="w" results in an error.
readFastq(dirPath, pattern=character(0), ...)
"readFastq"(dirPath, pattern=character(0), ..., withIds=TRUE)
writeFastq(object, file, mode="w", full=FALSE, compress=TRUE, ...)grep-style) pattern describing file
names to be read. The default (character(0)) results in
(attempted) input of all files in the directory.fastq format. For
methods, use showMethods(object, where=getNamespace("ShortRead")).full=TRUE or omitted full=FALSE on the
third line of the fastq record.TRUE.qualityType and
filter:
Auto (choose Illumina base 64 encoding
SFastqQuality if all characters are ASCII-encoded as
greater than 58 : and some characters are greater than 74
J), FastqQuality (Phred-like base 33 encoding),
SFastqQuality (Illumina base 64 encoding).
srFilter, used to
filter objects of class ShortReadQ at
input.
logical(1) indicating whether identifiers should
be read from the fastq file.readFastq returns a single R object (e.g.,
ShortReadQ) containing sequences and qualities
contained in all files in dirPath matching
pattern. There is no guarantee of order in which files are
read.writeFastq is invoked primarily for its side effect, creating
or appending to file file. The function returns, invisibly, the
length of object, and hence the number of records written.The fastq format is not quite precisely defined. The basic definition used here parses the following four lines as a single record:
@HWI-EAS88_1_1_1_1001_499
GGACTTTGTAGGATACCCTCGCTTTCCTTCTCCTGT
+HWI-EAS88_1_1_1_1001_499
]]]]]]]]]]]]Y]Y]]]]]]]]]]]]VCHVMPLAS
The first and third lines are identifiers preceded by a specific
character (the identifiers are identical, in the case of Solexa). The
second line is an upper-case sequence of nucleotides. The parser
recognizes IUPAC-standard alphabet (hence ambiguous nucleotides),
coercing . to - to represent missing values. The final
line is an ASCII-encoded representation of quality scores, with one
ASCII character per nucleotide.
The encoding implicit in Solexa-derived fastq files is that each
character code corresponds to a score equal to the ASCII character
value minus 64 (e.g., ASCII @ is decimal 64, and corresponds to
a Solexa quality score of 0). This is different from BioPerl, for
instance, which recovers quality scores by subtracting 33 from the
ASCII character value (so that, for instance, !, with decimal
value 33, encodes value 0).
The BioPerl description of fastq asserts that the first character of
line 4 is a !, but the current parser does not support this
convention.
writeFastq creates files following the specification outlined
above, using the IUPAC-standard alphabet (hence, sequences containing
. when read will be represented by - when written).
The IUPAC alphabet in Biostrings.
http://www.bioperl.org/wiki/FASTQ_sequence_format for the BioPerl definition of fastq.
Solexa documentation `Data analysis - documentation : Pipeline output and visualisation'.
showMethods(readFastq)
showMethods(writeFastq)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
sread(rfq)
id(rfq)
quality(rfq)
## SolexaPath method 'knows' where FASTQ files are placed
rfq1 <- readFastq(sp, pattern="s_1_sequence.txt")
rfq1
file <- tempfile()
writeFastq(rfq, file)
readLines(file, 8)
Run the code above in your browser using DataLab