Learn R Programming

segmentSeq (version 2.6.0)

heuristicSeg: A (fast) heuristic method for creation of a genome segment map.

Description

This method identifies by heuristic methods a set of loci from a segData or segMeth object. It does this by identifying within replicate groups regions of the genome that satisfy the criteria for being a locus and have no region within them that satisfies the criteria for being a null. These criteria can be defined by the user or inferred from the data.

Usage

heuristicSeg(sD, aD, gap = 100, RKPM = 1000, prop = 0.2, locCutoff = 0.99, subRegion = NULL, largeness = 1e8, getLikes = TRUE, verbose = TRUE, tempDir = NULL, cl = NULL, recoverFromTemp = FALSE)

Arguments

aD
An alignmentData or methData object.
sD
A segData or segMeth object derived from the `aD' object.
gap
What is the minimum length of a null region?
RKPM
For analysis of a segData object, what RKPM (reads per kilobase per million reads) distinguishes between a locus and a null region?
prop
For analysis of a segMeth object, what proportion of methylated cytosines distinguishes between a locus and a null region?. Defaults to 0.2.
locCutoff
For analysis of a segMeth object, with what likelihood must the proportion of methylated cytosines exceed the `prop' option? Defaults to 0.99.
subRegion
A 'data.frame' object defining the subregions of the genome to be segmented. If NULL (default), the whole genome is segmented.
largeness
The maximum size for a split analysis.
getLikes
Should posterior likelihoods for the new segmented genome (loci and nulls) be assessed?
verbose
Should the function be verbose? Defaults to TRUE.
tempDir
A directory for storing temporary files produced during the segmentation.
cl
A SNOW cluster object, or NULL. Defaults to NULL. See Details.
recoverFromTemp
If TRUE, will attempt to recover the position saved in 'tempDir'. Defaults to FALSE. See Details.

Value

A lociData object, containing count information on all the segments discovered.

Details

A 'cluster' object (package: snow) may be used for parallelisation of parts of this function when examining large data sets. Passing NULL to this variable will cause the function to run in non-parallel mode.

If recoverFromTemp = TRUE, the function will attempt to recover a crashed position from the temporary files in tempDir. At present, the function assumes you know what you are doing, and will perform no checking that these files are suitable for the specified recovery. Use with caution.

References

Hardcastle T.J., Kelly, K.A. and Balcombe D.C. (2011). Identifying small RNA loci from high-throughput sequencing data. In press.

See Also

classifySeg, an alternative approach to this problem using an empirical Bayes approach to classify segments. plotGenome, a function for plotting the alignment of tags to the genome (together with the segments defined by this function). baySeq, a package for discovering differential expression in lociData objects.

Examples

Run this code
# Define the chromosome lengths for the genome of interest.

chrlens <- c(2e6, 1e6)

# Define the files containing sample information.

datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")

# Establish the library names and replicate structure.

libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)

# Process the files to produce an `alignmentData' object.

alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens)

# Process the alignmentData object to produce a `segData' object.

sD <- processAD(alignData, gap = 100, cl = NULL)

# Use the segData object to produce a segmentation of the genome.

segD <- heuristicSeg(sD = sD, aD = alignData,
subRegion = data.frame(chr = ">Chr1", start = 1, end = 1e5),
cl = NULL)

Run the code above in your browser using DataLab