dissplot: Dissimilarity Plot

Description

Visualizes a dissimilarity matrix using seriation and matrix shading using the method developed by Hahsler and Hornik (2011). Entries with lower dissimilarities (higher similarity) are plotted darker. Such a plot can be used to uncover hidden structure in the data.

The plot can also be used to visualize cluster quality (see Ling 1973). Objects belonging to the same cluster are displayed in consecutive order. The placement of clusters and the within cluster order is obtained by a seriation algorithm which tries to place large similarities/small dissimilarities close to the diagonal. Compact clusters are visible as dark squares (low dissimilarity) on the diagonal of the plot. Additionally, a Silhouette plot (Rousseeuw 1987) is added. This visualization is similar to CLUSION (see Strehl and Ghosh 2002), however, allows for using arbitrary seriating algorithms.

Usage

dissplot(x, labels = NULL, method = "ARSA", 
  control = NULL, options = NULL, ...)

Arguments

an object of class dist.

labels

NULL or an integer vector of the same length as rows/columns in x indicating the cluster membership for each object in x as consecutive integers starting with one. The labels are used to reorder the

method

a list with up to three elements or a single character string. Use a single character string to apply the same algorithm to reorder the clusters (inter cluster seriation) as well as the objects within each cluster (intra cluster seriation).

control

a list of control options passed on to the seriation algorithm. In case of two different seriation algorithms, control can contain a list of two named elements (inter_cluster and intra_cluster

options

a list with options for plotting the matrix. The list can contain the following elements:

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Obje

...

further arguments are added to options.

Value

An invisible object of class cluster_proximity_matrix with the following elements:
orderNULL or integer vector giving the order used to plot x.
cluster_orderNULL or integer vector giving the order of the clusters as plotted.
methodvector of character strings indicating the seriation methods used for plotting x.
kNULL or integer scalar giving the number of clusters generated.
descriptiona data.frame containing information (label, size, average intra-cluster dissimilarity and the average silhouette) for the clusters as displayed in the plot (from top/left to bottom/right).
This object can be used for plotting via plot(x, options = NULL, ...), where x is the object and options contains a list with plotting options (see above).

References

Hahsler, M. and Hornik, K. (2011): Dissimilarity plots: A visual exploration tool for partitional clustering. Journal of Computational and Graphical Statistics, 10(2):335--354.

Ling, R.F. (1973): A computer generated aid for cluster analysis. Communications of the ACM, 16(6), 355--361.

Rousseeuw, P.J. (1987): Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20(1), 53--65.

Strehl, A. and Ghosh, J. (2003): Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2), 208--230.

Examples

Run this code

data("iris")
d <- dist(iris[-5])

## plot original matrix
res <- dissplot(d, method = NA)

## plot reordered matrix using the nearest insertion algorithm (from tsp)
res <- dissplot(d, method = "TSP",
    options = list(main = "Seriation (TSP)"))

## cluster with pam (we know iris has 3 clusters)
library("cluster")
l <- pam(d, 3, cluster.only = TRUE)

## we use a grid layout to place several plots on a page
library("grid")
grid.newpage()
pushViewport(viewport(layout=grid.layout(nrow = 2, ncol = 2), 
    gp = gpar(fontsize = 8)))
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 1))

## visualize the clustering (using ARSA between clusters and MDS within)
res <- dissplot(d, l, method = list(inter = "ARSA", intra = "MDS"),  
    options = list(main = "PAM + Seriation - standard", 
    newpage = FALSE))

popViewport()
pushViewport(viewport(layout.pos.row = 1, layout.pos.col = 2))

## more visualization options. Note that we reuse the reordered object res!
## color: use 10 shades red-blue
plot(res, options = list(main = "PAM + Seriation", 
    col= bluered(10, bias=.5), newpage = FALSE))

popViewport()
pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 1))

## threshold (using zlim) and cubic scale to highlight differences
plot(res, options = list(main = "PAM + Seriation - threshold", 
    zlim = c(0, 1.5), col = greys(100, power = 2), newpage = FALSE))

popViewport()
pushViewport(viewport(layout.pos.row = 2, layout.pos.col = 2))

## use custom (logistic) scale
plot(res, options = list(main = "PAM + Seriation - logistic scale", 
    col= hcl(c = 0, l = (plogis(seq(10, 0, length=100), 
	location = 2, scale = 1/2, log = FALSE))*100), 
	newpage = FALSE))

popViewport(2)

## the reordered_cluster_dissimilarity_matrix object
res 
names(res)

Run the code above in your browser using DataLab