heuristicSeg: A (fast) heuristic method for creation of a genome segment map.

Description

This method identifies by heuristic methods a set of loci from a segData or segMeth object. It does this by identifying within replicate groups regions of the genome that satisfy the criteria for being a locus and have no region within them that satisfies the criteria for being a null. These criteria can be defined by the user or inferred from the data.

Usage

heuristicSeg(sD, aD, gap = 100, RKPM = 1000, prop = 0.2, locCutoff = 0.99, subRegion =
NULL, largeness = 1e8, getLikes = TRUE, verbose = TRUE, tempDir = NULL,
cl = NULL, recoverFromTemp = FALSE)

Arguments

An alignmentData or methData object.

A segData or segMeth object derived from the `aD' object.

gap

What is the minimum length of a null region?

RKPM

For analysis of a segData object, what RKPM (reads per kilobase per million reads) distinguishes between a locus and a null region?

prop

For analysis of a segMeth object, what proportion of methylated cytosines distinguishes between a locus and a null region?. Defaults to 0.2.

locCutoff

For analysis of a segMeth object, with what likelihood must the proportion of methylated cytosines exceed the `prop' option? Defaults to 0.99.

subRegion

A 'data.frame' object defining the subregions of the genome to be segmented. If NULL (default), the whole genome is segmented.

largeness

The maximum size for a split analysis.

getLikes

Should posterior likelihoods for the new segmented genome (loci and nulls) be assessed?

verbose

Should the function be verbose? Defaults to TRUE.

tempDir

A directory for storing temporary files produced during the segmentation.

A SNOW cluster object, or NULL. Defaults to NULL. See Details.

recoverFromTemp

If TRUE, will attempt to recover the position saved in 'tempDir'. Defaults to FALSE. See Details.

Value

A lociData object, containing count information on all the segments discovered.

Details

A 'cluster' object (package: snow) may be used for parallelisation of parts of this function when examining large data sets. Passing NULL to this variable will cause the function to run in non-parallel mode.

If recoverFromTemp = TRUE, the function will attempt to recover a crashed position from the temporary files in tempDir. At present, the function assumes you know what you are doing, and will perform no checking that these files are suitable for the specified recovery. Use with caution.

References

Hardcastle T.J., Kelly, K.A. and Balcombe D.C. (2011). Identifying small RNA loci from high-throughput sequencing data. In press.

Examples

Run this code

# Define the chromosome lengths for the genome of interest.

chrlens <- c(2e6, 1e6)

# Define the files containing sample information.

datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")

# Establish the library names and replicate structure.

libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)

# Process the files to produce an `alignmentData' object.

alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens)

# Process the alignmentData object to produce a `segData' object.

sD <- processAD(alignData, gap = 100, cl = NULL)

# Use the segData object to produce a segmentation of the genome.

segD <- heuristicSeg(sD = sD, aD = alignData,
subRegion = data.frame(chr = ">Chr1", start = 1, end = 1e5),
cl = NULL)

Run the code above in your browser using DataLab