Learn R Programming

R453Plus1Toolbox (version 1.22.0)

complexity.dust: Sequence Complexity Using The DUST Algorithm

Description

This function evaluates the sequence complexity using the DUST algorithm.

Usage

complexity.dust(object, xlab="Complexity score (0=high, 100=low)", ylab="Number of sequences", xlim=c(0, 100), col="firebrick1", breaks=100, ...)

Arguments

object
An object of class DNAStringSet, ShortRead or SFFContainer.
xlab
The X axis label.
ylab
The Y axis label.
xlim
The limits of the X axis.
col
The plotting color.
breaks
The number of breaks in the histogram (see ‘hist’).
...
Arguments to be passed to methods, such as graphical parameters (see ‘par’).

Value

A numeric vector containing the complexity score for each sequence.

Details

The complexity score is based on how often different trinucleotides occur and is scaled between 0 and 100. A sequence of homopolymer repeats (e.g. TTTTTTTTTT) has a score of 100, of dinucleotide repeats (e.g. TATATATATA) has a score around 49, and of trinucleotide repeats (e.g. TAGTAGTAG) has a score around 32. Scores above seven can be considered low-complexity.

References

Schmieder R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 2011 Mar 15;27(6):863-4.