regionFinder: Find non-zero regions in vector

Description

Find regions for which a numeric vector is above (or below) predefined thresholds.

Usage

regionFinder(x, chr, pos, cluster = NULL, y = x, summary = mean, ind = seq(along = x), order = TRUE, oneTable = TRUE, maxGap = 300, cutoff=quantile(abs(x), 0.99), assumeSorted = FALSE, verbose = TRUE)

Arguments

A numeric vector.

chr

A character vector with the chromosomes of each location.

pos

A numeric vector representing the genomic location.

cluster

The clusters of locations that are to be analyzed together. In the case of microarrays, the cluster is many times supplied by the manufacturer. If not available the function clusterMaker can be used.

A numeric vector with same length as x containing values to be averaged for the region summary. See details for more.

summary

The function to be used to construct a summary of the y values for each region.

ind

an optional vector specifying a subset of observations to be used when finding regions.

order

if TRUE then the resulting tables are ordered based on area of each region. Area is defined as the absolute value of the summarized y times the number of features in the regions.

oneTable

if TRUE only one results table is returned. Otherwise, two tables are returned: one for the regions with positive values and one for the negative values.

maxGap

If cluster is not provided this number will be used to define clusters via the clusterMaker function.

cutoff

This argument is passed to getSegments. It represents the upper (and optionally the lower) cutoff for x.

assumeSorted

This argument is passed to getSegments and clusterMaker.

verbose

Should the function be verbose?

Value

If oneTable is FALSE it returns two tables otherwise it returns one table. The rows of the table are regions. Information on the regions is included in the columns.

Details

This function is used in the final steps of bumphunter. While bumphunter does many things, such as regression and permutation, regionFinder simply finds regions that are above a certain threshold (using getSegments) and summarizes them. The regions are found based on x and the summarized values are based on y (which by default equals x). The summary is used for the ranking so one might, for example, use t-tests to find regions but summarize using effect sizes.

Examples

Run this code

x <- seq(1:1000)
y <- sin(8*pi*x/1000) + rnorm(1000, 0, 0.2)
chr <- rep(c(1,2), each=length(x)/2)
tab <- regionFinder(y, chr, x, cutoff=0.8)
print(tab[tab$L>10,])

Run the code above in your browser using DataLab