Learn R Programming

seqTools (version 1.6.0)

trimFastq: trimFastq: Performs sequence removal, trimming (fixed and quality based) and nucleotide masking on FASTQ files.

Description

Fastq files sometimes need to be preprocessed before alignment. Three different mechanisms come into use here: Discarding whole reads, trimming sequences and masking nucleotides. This function performs all three mechanisms together in one step. All reads with insufficient phred are discarded. The reads can be trimmed ad each terminal side (on trim of fixed size and a trim based on quality thresholds).

Usage

trimFastq(infile, outfile="keep.fq.gz", discard="disc.fq.gz", qualDiscard=0, qualMask=0, fixTrimLeft=0, fixTrimRight=0, qualTrimLeft=0, qualTrimRight=0, qualMaskValue=78, minSeqLen=0)

Arguments

infile
character. Input FASTQ file. Only one infile is allowed per function call.
outfile
character. Output FASTQ file.
discard
character. Output file in which discarded reads are written.
qualDiscard
numeric. All reads which contain one or more phred scores
qualMask
numeric. All nucleotides for which phred score < qualMask will be overwritten with qualMaskValue.
fixTrimLeft
numeric. Prefix of this size will be trimmed.
fixTrimRight
numeric. Suffix of this size will be trimmed.
qualMaskValue
numeric. ASCII replace value for masked nucleotides
qualTrimLeft
numeric. Prefix where all phred scores are < qualTrimLeft will be trimmed.
qualTrimRight
numeric. Suffix where all phred scores are < qualTrimRight will be trimmed.
minSeqLen
numeric. All reads where sequence length after (fixed and quality based) trimming is

Value

are written to output and to discard

Details

The function divides the input file into two outputs: The output file (contains the accepted reads) and the discard file (contains the excluded reads). After trim operations, the function checks for remaining read length. When the read length is smaller than minSeqLen, the read will be discarded.

References

Ewing B, Green P Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Research 1998 Vol. 8 No. 3 186-194

Examples

Run this code
basedir <- system.file("extdata", package="seqTools")
setwd(basedir)
trimFastq("sim.fq.gz", qualDiscard=10, qualMask=15, fixTrimLeft=2,
    fixTrimRight=2, qualTrimLeft=28, qualTrimRight=30, minSeqLen=5)

Run the code above in your browser using DataLab