hclust
function and
various distance metrics derived from percent methylation per base or per
region for each sample.Hierarchical Clustering using methylation data
The function clusters samples using hclust
function and
various distance metrics derived from percent methylation per base or per
region for each sample.
clusterSamples(.Object, dist="correlation", method="ward",
sd.filter=TRUE,sd.threshold=0.5,
filterByQuantile=TRUE, plot=TRUE,chunk.size)# S4 method for methylBase
clusterSamples(.Object, dist, method, sd.filter,
sd.threshold, filterByQuantile, plot)
# S4 method for methylBaseDB
clusterSamples(.Object, dist = "correlation",
method = "ward", sd.filter = TRUE, sd.threshold = 0.5,
filterByQuantile = TRUE, plot = TRUE, chunk.size = 1e+06)
a methylBase
or methylBaseDB
object
the distance measure to be used. This must be one of
"correlation
", "euclidean
", "maximum
",
"manhattan
", "canberra
", "binary
" or "minkowski
".
Any unambiguous abbreviation can be given. (default:"correlation
")
the agglomeration method to be used. This should be
(an unambiguous abbreviation of) one of "ward
", "single
",
"complete
", "average
", "mcquitty
", "median
"
or "centroid
". (default:"ward
")
If TRUE
, the bases/regions with low variation will be
discarded prior to clustering (default:TRUE)
A numeric value. If filterByQuantile
is TRUE
,
features whose standard deviations is less than the quantile denoted by
sd.threshold
will be removed.
If filterByQuantile
is FALSE
, then features whose
standard deviations is less than the value of sd.threshold
will be removed.(default:0.5)
A logical determining if sd.threshold
is to
be interpreted as a quantile of all Standard Deviation values from
bases/regions (the default), or as an absolute value
a logical value indicating whether to plot hierarchical clustering. (default:TRUE)
Number of rows to be taken as a chunk for processing the methylBaseDB
objects, default: 1e6
a tree
object of a hierarchical cluster analysis using a set
of dissimilarities for the n objects being clustered.
The parameter chunk.size
is only used when working with
methylBaseDB
objects,
as they are read in chunk by chunk to enable processing large-sized
objects which are stored as flat file database.
Per default the chunk.size is set to 1M rows, which should work for
most systems. If you encounter memory problems or
have a high amount of memory available feel free to adjust the
chunk.size
.
# NOT RUN {
data(methylKit)
clusterSamples(methylBase.obj, dist="correlation", method="ward", plot=TRUE)
# }
Run the code above in your browser using DataLab