getCounts: Gets counts from alignment data from a set of genome segments.

Description

A function for extracting count data from an alignmentData object given a set of segments defined on the genome.

Usage

getCounts(segments, aD, preFiltered = FALSE, adjustMultireads = TRUE, useChunk = FALSE, cl)

Arguments

segments

A GRanges object which defines a set of segments for which counts are required.

An alignmentData object.

preFiltered

The function internally cleans the data; however, this may not be needed and omitting these steps may save computational time. See Details.

adjustMultireads

If working with methylation data, this option toggles an adjustment for reads that align to multiple locations on the genome. Defaults to TRUE.

useChunk

If all segments are within defined `chunks' of the alignmentData object, speed increases if this is set to TRUE. Otherwise, counts may be inaccurate. Defaults to FALSE.

A SNOW cluster object, or NULL. See Details.

Value

If `as.matrix', a matrix, each column of which corresponds to a library in the alignmentData object `aD' and each row to the segment defined by the corresponding row in `segments'. Otherwise an equivalent DataFrame object.

Details

The function extracts count data from alignmentData object 'aD' given a set of segments. The non-trivial aspect of this function is that at a segment which contains a tag that matches to multiple places in that segment (and thus appears multiple times in the alignmentData object) should count it only once.

If preFiltered = FALSE then the function allows for missing (NA) data in the segments, unordered segments and duplicated segments. If the segment list has no missing data, is already ordered, and contains no duplications, then computational time can be saved by setting preFiltered = TRUE. A cluster object (package: snow) is recommended for parallelisation of this function when using large data sets. Passing NULL to this variable will cause the function to run in non-parallel mode. In general, this function will probably not be accessed by the user as the processAD function includes a call to getCounts as part of the standard processing of an alignmentData object into a segData object.

Examples

Run this code


# Define the chromosome lengths for the genome of interest.

chrlens <- c(2e6, 1e6)

# Define the files containing sample information.

datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")

# Establish the library names and replicate structure.

libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)

# Process the files to produce an 'alignmentData' object.

alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens, gap = 100)

# Get count data for three arbitrarily chosen segments on chromosome 1.

getCounts(segments = GRanges(seqnames = c(">Chr1"),
          IRanges(start = c(1,100,2000), end = c(40,3000,5000))), 
          aD = alignData, cl = NULL)