These generic functions remove leading or trailing nucleotides or
qualities. trimTails
and trimTailw
remove low-quality
reads from the right end using a sliding window (trimTailw
) or
a tally of (successive) nucleotides falling at or below a quality
threshold (trimTails
). trimEnds
takes an alphabet of
characters to remove from either left or right end.
## S4 methods for 'ShortReadQ', 'FastqQuality', or 'SFastqQuality'
trimTailw(object, k, a, halfwidth, ..., ranges=FALSE)
trimTails(object, k, a, successive=FALSE, ..., ranges=FALSE)
trimEnds(object, a, left=TRUE, right=TRUE, relation=c("<=", "="=")," ...,="" ranges="FALSE)
"trimTailw"(object, k, a, halfwidth, ..., alphabet, ranges=FALSE)
"trimTails"(object, k, a, successive=FALSE, ..., alphabet, ranges=FALSE)
"trimTailw"(object, k, a, halfwidth, ..., destinations, ranges=FALSE)
"trimTails"(object, k, a, successive=FALSE, ..., destinations, ranges=FALSE)
"trimEnds"(object, a, left=TRUE, right=TRUE, relation=c("<=", "="=")," ...,="" destinations,="" ranges="FALSE)=",>
ShortReadQ
and
derived classes; see below to discover these methods) or character
vector of fastq file(s) to be trimmed. integer(1)
describing the number of failing
letters required to trigger trimming.trimTails
and trimTailw
, a
character(1)
with nchar(a) == 1L
giving the letter at
or below which a nucleotide is marked as failing. For trimEnds
a character()
with all nchar() ==
1L
giving the letter at or below which a nucleotide or quality
scores marked for removal.
logical(1)
indicating whether failures can
occur anywhere in the sequence, or must be successive. If
successive=FALSE
, then the k'th failed letter and subsequent
are removed. If successive=TRUE
, the first succession of k
failed and subsequent letters are removed.logical(1)
indicating whether trimming is
from the left or right ends.character(1)
selected from the argument values,
i.e., <= or="" ="=" indicating="" whether="" all="" letters="" at="" below="" the="" alphabet(object) are to be removed, or only
exact matches.=>
object
of type character()
, an
equal-length vector of destination files. Files must not already
exist.character()
(ordered low to high) letters on
which quality scale is measured. Usually supplied internally (user
does not need to specify). If missing, then set to ASCII characters
0-127.logical(1)
indicating whether the trimmed object,
or only the ranges satisfying the trimming condition, be returned.class(object)
trimmed to contain only those
nucleotides satisfying the trim criterion or, if ranges=TRUE
an
IRanges
instance defining the ranges that would trim
object
. trimTailw
starts at the left-most nucleotide, tabulating the
number of cycles in a window of 2 * halfwidth + 1
surrounding
the current nucleotide with quality scores that fall at or below
a
. The read is trimmed at the first nucleotide for which this
number >= k
. The quality of the first or last nucleotide is
used to represent portions of the window that extend beyond the
sequence.
trimTails
starts at the left-most nucleotide and accumulates
cycles for which the quality score is at or below a
. The read
is trimmed at the first location where this number >= k
. With
successive=TRUE
, failing qualities must occur in strict
succession.
trimEnds
examines the left
, right
, or both ends
of object
, marking for removal letters that correspond to
a
and relation
. The trimEnds,ShortReadQ-method
trims based on quality.
ShortReadQ
methods operate on quality scores; use
sread()
and the ranges
argument to trim based on
nucleotide (see examples).
character
methods transform one or several fastq files to new
fastq files, applying trim operations based on quality scores; use
filterFastq
with your own filter
argument to filter on
nucleotides.
showMethods(trimTails)
sp <- SolexaPath(system.file('extdata', package='ShortRead'))
rfq <- readFastq(analysisPath(sp), pattern="s_1_sequence.txt")
## remove leading / trailing quality scores <= 'I'
trimEnds(rfq, "I")
## remove leading / trailing 'N's
rng <- trimEnds(sread(rfq), "N", relation="==", ranges=TRUE)
narrow(rfq, start(rng), end(rng))
## remove leading / trailing 'G's or 'C's
trimEnds(rfq, c("G", "C"), relation="==")
Run the code above in your browser using DataLab