Learn R Programming

qrqc (version 1.26.0)

kmerKLPlot-methods: Plot K-L Divergence Components for a Subset of k-mers to Inspect for Contamination

Description

kmerKLPlot calls calcKL, which calculates the Kullback-Leibler divergence between the k-mer distribution at each position compared to the k-mer distribution across all positions. kmerKLPlot then plots each k-mer's contribution to the total K-L divergence by stack bars, for a subset of the k-mers. Since there are 4^k possible k-mers for some value k-mers, plotting each often dilutes the interpretation; however one can increase n.kmers to a number greater than the possible number of k-mers to force kmerKLPlot to plot the entire K-L divergence and all terms (which are k-mers) in the sum.

If a x is a list, the K-L k-mer plots are faceted by sample; this allows comparison to a FASTA file of random reads.

Again, please note that this is not the total K-L divergence, but rather the K-L divergence calculated on a subset of the sample space (those of the top n.kmers k-mers selected).

Usage

kmerKLPlot(x, n.kmers=20)

Arguments

x
an S4 object a class that inherits from SequenceSummary from readSeqFile or a list of objects that inherit from SequenceSummary with names.
n.kmers
a integer value indicating the size of top k-mers to include.

Methods

signature(x = "SequenceSummary")
kmerKLPlot will plot the K-L divergence for a subset of k-mers for a single object that inherits from SequenceSummary.
signature(x = "list")
kmerKLPlot will plot the K-L divergence for a susbet of k-mers for each of the objects that inherit from SequenceSummary in the list and display them in a series of panels.

See Also

getKmer, calcKL, kmerEntropyPlot

Examples

Run this code
  ## Load a somewhat contaminated FASTQ file
  s.fastq <- readSeqFile(system.file('extdata', 'test.fastq',
    package='qrqc'), hash.prop=1)

  ## Load a really contaminated FASTQ file
  s.contam.fastq <- readSeqFile(system.file('extdata',
    'test-contam.fastq', package='qrqc'), hash.prop=1)

  ## Load a random (equal base frequency) FASTA file
  s.random.fasta <- readSeqFile(system.file('extdata',
    'random.fasta', package='qrqc'), type="fasta", hash.prop=1)

  ## Make K-L divergence plot - shows slight 5'-end bias. Note units
  ## (bits)
  suppressWarnings(kmerKLPlot(s.fastq))

  ## Plot multiple K-L divergence plots
  suppressWarnings(kmerKLPlot(list("highly contaminated"=s.contam.fastq, "less
    contaminated"=s.fastq, "random"=s.random.fasta)))

Run the code above in your browser using DataLab