Learn R Programming

methylKit (version 0.99.2)

clusterSamples: Hierarchical Clustering using methylation data The function clusters samples using hclust function and various distance metrics derived from percent methylation per base or per region for each sample.

Description

Hierarchical Clustering using methylation data

The function clusters samples using hclust function and various distance metrics derived from percent methylation per base or per region for each sample.

Usage

clusterSamples(.Object, dist="correlation", method="ward",
                       sd.filter=TRUE,sd.threshold=0.5,
                       filterByQuantile=TRUE, plot=TRUE,chunk.size)

# S4 method for methylBase clusterSamples(.Object, dist, method, sd.filter, sd.threshold, filterByQuantile, plot)

# S4 method for methylBaseDB clusterSamples(.Object, dist = "correlation", method = "ward", sd.filter = TRUE, sd.threshold = 0.5, filterByQuantile = TRUE, plot = TRUE, chunk.size = 1e+06)

Arguments

.Object

a methylBase or methylBaseDB object

dist

the distance measure to be used. This must be one of "correlation", "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". Any unambiguous abbreviation can be given. (default:"correlation")

method

the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid". (default:"ward")

sd.filter

If TRUE, the bases/regions with low variation will be discarded prior to clustering (default:TRUE)

sd.threshold

A numeric value. If filterByQuantile is TRUE, features whose standard deviations is less than the quantile denoted by sd.threshold will be removed. If filterByQuantile is FALSE, then features whose standard deviations is less than the value of sd.threshold will be removed.(default:0.5)

filterByQuantile

A logical determining if sd.threshold is to be interpreted as a quantile of all Standard Deviation values from bases/regions (the default), or as an absolute value

plot

a logical value indicating whether to plot hierarchical clustering. (default:TRUE)

chunk.size

Number of rows to be taken as a chunk for processing the methylBaseDB objects, default: 1e6

Value

a tree object of a hierarchical cluster analysis using a set of dissimilarities for the n objects being clustered.

Details

The parameter chunk.size is only used when working with methylBaseDB objects, as they are read in chunk by chunk to enable processing large-sized objects which are stored as flat file database. Per default the chunk.size is set to 1M rows, which should work for most systems. If you encounter memory problems or have a high amount of memory available feel free to adjust the chunk.size.

Examples

Run this code
# NOT RUN {
data(methylKit)

clusterSamples(methylBase.obj, dist="correlation", method="ward", plot=TRUE)



# }

Run the code above in your browser using DataLab