xcluster: Hierarchical clustering

Description

Performs a hierarchical cluster analysis on a set of dissimilarities (this function launch an external program: Xcluster).

Usage

xcluster(data,distance="euclidean",clean=FALSE,tmp.in="tmp.txt",tmp.out="tmp.gtr")

Arguments

data

a matrix (or data frame) which provides the data to analyze

distance

The distance measure used with Xcluster. This must be one of "euclidean", "pearson" or "notcenteredpearson". Any unambiguous substring can be given.

clean

a logical value indicating whether you want the true distances (clean=FALSE), or you want a clean dendrogram

tmp.in, tmp.out

temporary files for Xcluster

Value

merge: an $n-1$ by 2 matrix. Row $i$ of merge describes the merging of clusters at step $i$ of the clustering. If an element $j$ in the row is negative, then observation $-j$ was merged at this stage. If $j$ is positive then the merge was with the cluster formed at the (earlier) stage $j$ of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.
height: a set of $n-1$ non-decreasing real values. The clustering height: that is, the value of the criterion associated with the clustering method for the particular agglomeration.
order: a vector giving the permutation of the original observations suitable for plotting, in the sense that a cluster plot using this ordering and matrix merge will not have crossings of the branches.
labels: labels for each of the objects being clustered.
call: the call which produced the result.
method: the cluster method that has been used.
dist.method: the distance that has been used to create d (only returned if the distance object has a "method" attribute).

Details

Available distance measures are (written for two vectors $x$ and $y$):

Euclidean: Usual square distance between the two vectors (2 norm).
Pearson: $1 - cor(x,y)$
Pearson not centered: $1 - [ sum x_i y_i ] / sqrt[ sum x_i^2 * sum y_i^2 ] $

Xcluster does not use usual agglomerative methods (single, average, complete), but compute the distance between each groups' barycenter for the distance between two groups.

This have a problem for this kind of data:

A	0
0	B
0	1
C	0.9
0.5	A

Ie: a triangular in R$^2$, the distance between A and B is larger than the distance between the group A,B and C (with euclidean distance).

For that case it can be useful to use clean=TRUE and that mean that you must not consider A and B as a group without C.

References

Antoine Lucas and Sylvain Jasson, Using amap and ctc Packages for Huge Clustering, R News, 2006, vol 6, issue 5 pages 58-60.

Examples

Run this code

#    Create data
set.seed(1)
m <- matrix(rep(1,3*24),ncol=3)  
m[9:16,3] <- 3 ; m[17:24,] <- 3    #create 3 groups
m <- m+rnorm(24*3,0,0.5)           #add noise
m <- floor(10*m)/10                #just one digits


# And once you have Xcluster program:
#
#h <- xcluster(m)
#
#plot(h)

Run the code above in your browser using DataLab