readFastq
reads all FASTQ-formated files in a directory
dirPath
whose file name matches pattern pattern
,
returning a compact internal representation of the sequences and
quality scores in the files. Methods read all files into a single R
object; a typical use is to restrict input to a single FASTQ file.
writeFastq
writes an object to a single file
, using
mode="w"
(the default) to create a new file or mode="a"
append to an existing file. Attempting to write to an existing file
with mode="w"
results in an error.
readFastq(dirPath, pattern=character(0), ...)
"readFastq"(dirPath, pattern=character(0), ..., withIds=TRUE)
writeFastq(object, file, mode="w", full=FALSE, compress=TRUE, ...)
grep
-style) pattern describing file
names to be read. The default (character(0)
) results in
(attempted) input of all files in the directory.fastq
format. For
methods, use showMethods(object, where=getNamespace("ShortRead"))
.full=TRUE
or omitted full=FALSE
on the
third line of the fastq record.TRUE
.qualityType
and
filter
:
Auto
(choose Illumina base 64 encoding
SFastqQuality
if all characters are ASCII-encoded as
greater than 58 :
and some characters are greater than 74
J
), FastqQuality
(Phred-like base 33 encoding),
SFastqQuality
(Illumina base 64 encoding).
srFilter
, used to
filter objects of class ShortReadQ
at
input.
logical(1)
indicating whether identifiers should
be read from the fastq file.readFastq
returns a single R object (e.g.,
ShortReadQ
) containing sequences and qualities
contained in all files in dirPath
matching
pattern
. There is no guarantee of order in which files are
read.writeFastq
is invoked primarily for its side effect, creating
or appending to file file
. The function returns, invisibly, the
length of object
, and hence the number of records written.The fastq format is not quite precisely defined. The basic definition used here parses the following four lines as a single record:
@HWI-EAS88_1_1_1_1001_499 GGACTTTGTAGGATACCCTCGCTTTCCTTCTCCTGT +HWI-EAS88_1_1_1_1001_499 ]]]]]]]]]]]]Y]Y]]]]]]]]]]]]VCHVMPLAS
The first and third lines are identifiers preceded by a specific
character (the identifiers are identical, in the case of Solexa). The
second line is an upper-case sequence of nucleotides. The parser
recognizes IUPAC-standard alphabet (hence ambiguous nucleotides),
coercing .
to -
to represent missing values. The final
line is an ASCII-encoded representation of quality scores, with one
ASCII character per nucleotide.
The encoding implicit in Solexa-derived fastq files is that each
character code corresponds to a score equal to the ASCII character
value minus 64 (e.g., ASCII @
is decimal 64, and corresponds to
a Solexa quality score of 0). This is different from BioPerl, for
instance, which recovers quality scores by subtracting 33 from the
ASCII character value (so that, for instance, !
, with decimal
value 33, encodes value 0).
The BioPerl description of fastq asserts that the first character of
line 4 is a !
, but the current parser does not support this
convention.
writeFastq
creates files following the specification outlined
above, using the IUPAC-standard alphabet (hence, sequences containing
. when read will be represented by - when written).
The IUPAC alphabet in Biostrings.
http://www.bioperl.org/wiki/FASTQ_sequence_format for the BioPerl definition of fastq.
Solexa documentation `Data analysis - documentation : Pipeline output and visualisation'.
showMethods(readFastq)
showMethods(writeFastq)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
sread(rfq)
id(rfq)
quality(rfq)
## SolexaPath method 'knows' where FASTQ files are placed
rfq1 <- readFastq(sp, pattern="s_1_sequence.txt")
rfq1
file <- tempfile()
writeFastq(rfq, file)
readLines(file, 8)
Run the code above in your browser using DataLab