hopach(data, dmat = NULL, d = "cosangle", clusters = "best", K = 15,
kmax = 9, khigh = 9, coll = "seq", newmed = "medsil", mss = "med",
impr = 0, initord = "co", ord = "own", verbose=FALSE)
hdist
object of pair wise distances between all genes (arrays). All values
must be numeric, and missing values are not allowed. If NULL, this matrix is computed
using the metric specified by d
. If a matrix is provided, the user is
responsible for ensuring that the metric used agrees with d
.distancematrix()
and distancevector()
.impr
, then the collapse is not performed.TRUE
then verbose output is printed.data
that are the 'k'
cluster medoids, i.e. profiles (or centroids) for each cluster.'sizes' is a vector containing the 'k' cluster sizes.'labels' is a vector containing the main cluster labels for every variable. Each
label consists of one digit per level of the tree (up to the level identified as
the main clusters). The digit (1-9) indicates which child cluster the variable
was in at that level. For example, '124' means the fist (leftmost in the tree)
cluster in level 1, the second child of cluster '1' in level 2, and the fourth
child of cluster '12' in level 3. These can be mapped to the numbers 1:k for
simplicity, though the tree structure and relationship amongst the clusters is
then lost, e.g. 1211 is closer to 1212 than to 1221.'order' is a vector containing the ordering of variables within the main clusters.
The clusters are ordered deterministically as the tree is built. The elements within
each of the main clusters are ordered with the method determined by the value of
ord
: "own" (relative to own medoid), "neighbor" (relative to next medoid
to the right), or "co" (maximize correlation ordering).
ord
: "own" (relative to own medoid), "neighbor" (relative
to next medoid to the right), or "co" (maximize correlation ordering).'medoids' is a matrix containing the labels and corresponding medoids for each
internal node and leaf of the tree. The number of digits in the label indicates
the level for that node. The medoid refers to a row of data
The Median (or Mean) Split Silhouette (MSS) criteria is used by HOPACH to (i) determine the optimal number of children at each node, (ii) decide which pairs of clusters to collapse at each level, and (iii) identify the first level of the tree with maximally homogeneous clusters. In each case, the goal is to minimize MSS, which is a measure of cluster heterogeneity described in http://www.bepress.com/ucbbiostat/paper107/.
In hopach versions <2.0.0, these="" functions="" returned="" the="" square="" root="" of="" usual="" distance="" for="" d="cosangle",
d="abscosangle"
,
d="cor"
, and d="abscor"
. Typically, this transformation makes
the dissimilarity correspond more closely with the norm. In order to
agree with the dist
function, the square root is no longer used
in versions >=2.0.0. See ? distancematrix(). 2.0.0,>
van der Laan, M.J. and Pollard, K.S. A new algorithm for hybrid hierarchical clustering with visualization and the bootstrap. Journal of Statistical Planning and Inference, 2003, 117, pp. 275-303.
http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/hopach.pdf
http://www.bepress.com/ucbbiostat/paper107/
http://www.stat.berkeley.edu/~laan/Research/Research_subpages/Papers/jsmpaper.pdf
Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York.
distancematrix
, labelstomss
, boothopach
, pam
, makeoutput
#25 variables from two groups with 3 observations per variable
mydata<-rbind(cbind(rnorm(10,0,0.5),rnorm(10,0,0.5),rnorm(10,0,0.5)),cbind(rnorm(15,5,0.5),rnorm(15,5,0.5),rnorm(15,5,0.5)))
dimnames(mydata)<-list(paste("Var",1:25,sep=""),paste("Exp",1:3,sep=""))
mydist<-distancematrix(mydata,d="cosangle") #compute the distance matrix.
#clusters and final tree
clustresult<-hopach(mydata,dmat=mydist)
clustresult$clustering$k #number of clusters.
dimnames(mydata)[[1]][clustresult$clustering$medoids] #medoids of clusters.
table(clustresult$clustering$labels) #equal to clustresult$clustering$sizes.
#faster, sometimes fewer clusters
greedyresult<-hopach(mydata,clusters="greedy",dmat=mydist)
#only get the final ordering (no partitioning into clusters)
orderonly<-hopach(mydata,clusters="none",dmat=mydist)
#cluster the columns (rather than rows)
colresult<-hopach(t(mydata),dmat=distancematrix(t(mydata),d="euclid"))
Run the code above in your browser using DataLab