Learn R Programming

R453Plus1Toolbox (version 1.22.0)

complexity.entropy: Sequence Complexity Using The Shannon-Wiener Algorithm

Description

This function evaluates the sequence complexity using the Shannon-Wiener Algorithm.

Usage

complexity.entropy(object, xlab="Complexity score (0=low, 100=high)", ylab="Number of sequences", xlim=c(0, 100), col="firebrick1", breaks=100, ...)

Arguments

object
An object of class DNAStringSet, ShortRead or SFFContainer.
xlab
The X axis label.
ylab
The Y axis label.
xlim
The limits of the X axis.
col
The plotting color.
breaks
The number of breaks in the histogram (see ‘hist’).
...
Arguments to be passed to methods, such as graphical parameters (see ‘par’).

Value

A numeric vector containing the complexity score for each sequence.

Details

The entropy approach evaluates the entropy of trinucleotides in a sequence. The entropy values are scaled from 0 to 100 and lower entropy values imply lower complexity. A sequence of homopolymer repeats (e.g. TTTTTTTTTT) has an entropy value of 0, of dinucleotide repeats (e.g. TATATATATA) has an entropy value around 16, and of trinucleotide repeats (e.g. TAGTAGTAG) has an entropy value around 26. Scores below 70 can be considered low-complexity.

References

Schmieder R. (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics, 2011 Mar 15;27(6):863-4.