plot: Plot Clustering Results

Description

Functions for Visualizing Clustering Results

Usage

# S4 method for APResult,missing
plot(x, y, type=c("netsim", "dpsim", "expref"),
    xlab="# Iterations", ylab="Similarity", ...)
# S4 method for ExClust,matrix
plot(x, y, connect=TRUE, xlab="", ylab="",
labels=NA, limitNo=15, ...)
# S4 method for ExClust,data.frame
plot(x, y, connect=TRUE, xlab="",
ylab="", labels=NA, limitNo=15, ...)
# S4 method for AggExResult,missing
plot(x, y, main="Cluster dendrogram",
    xlab="", ylab="", ticks=4, digits=2, base=0.05, showSamples=FALSE,
    horiz=FALSE, ...)
# S4 method for AggExResult,matrix
plot(x, y, k=NA, h=NA, ...)
# S4 method for AggExResult,data.frame
plot(x, y, k=NA, h=NA, ...)

Value

see details above

Arguments

x: a clustering result object of class APResult, ExClust, or AggExResult
y: a matrix or data frame (see details below)
type: a string or array of strings indicating which performance measures should be plotted; valid values are "netsim", "dpsim", and "expref" which can be used in any combination or order; all other strings are ignored (for the meaning see APResult)
xlab, ylab: labels for axes of 2D plots; ignored if y has more than two columns
labels: names used for variables in scatter plot matrix (displayed if y has more than two columns). If NA (default), column names are used. If no column names are available, labels such as x[, 2] are displayed.
limitNo: if the number of columns/features in y is too large, problems may occur when attempting to plot a scatter plot matrix. To avoid problems, the plot method throws an error if the number of columns exceeds limitNo. For special applications, users can increase the value (15 by default). If limitNo is set to NA or any other non-numeric value, the limit is ignored entirely. Please note that attempting to plot scatter plot matrices with too many features may corrupt the graphics device. So users are making changes at their own risk. If plotting of many features is necessary, make sure that the graphics device is large enough to accommodate the plot (e.g. by using a sufficiently large graphics file device).
connect: used only if clustering is plotted on original data, ignored otherwise. If connect is TRUE, lines are drawn connecting exemplars with their cluster members.
main: title of plot
ticks: number of ticks used for the axis on the left side of the plot (applies to dendrogram plots only, see below)
digits: number of digits used for the axis tickmarks on the left side of the plot (applies to dendrogram plots only, see below)
base: fraction of height used for the very first join; defaults to 0.05, i.e. the first join appears at 5% of the total height of the dendrogram.
showSamples: if TRUE, a complete cluster hierarchy is shown, otherwise, in case that x is a hierarchy of clusters, the dendrogram of clusters is shown. For backward compatibility, the default is FALSE.
horiz: if TRUE, the dendrogram is plotted horizontally (analogous to plot.dendrogram). The default is FALSE.
k: level to be selected when plotting a single clustering level of cluster hierarchy (i.e. the number of clusters; see cutree-methods)
h: cut-off to be used when plotting a single clustering level of cluster hierarchy (see cutree-methods)
...: all other arguments are passed to the plotting command that are used internally, plot or heatmap.

Author

Ulrich Bodenhofer, Andreas Kothmeier & Johannes Palme apcluster@bioinf.jku.at

Details

If plot is called for an APResult object without specifying the second argument y, a plot is created that displays graphs of performance measures over execution time of the affinity propagation run. This only works if apcluster was called with details=TRUE.

If plot is called for an APResult object along with a matrix or data frame as argument y, then the dimensions of the matrix determine the behavior of plot:

If the matrix y has two columns, y is interpreted as the original data set. Then a plot of the clustering result superimposed on the original data set is created. Each cluster is displayed in a different color. The exemplar of each cluster is highlighted by a black square. If connect is TRUE, lines connecting the cluster members to their exemplars are drawn. This variant of plot does not return any value.
If y has more than two columns, clustering results are superimposed in a sort of scatter plot matrix. The variant that y is interpreted as similarity matrix if it is quadratic has been removed in version 1.3.2. Use heatmap instead.
If y has only one column, an error is displayed.

If plot is called for an ExClust object along with a matrix or data frame as argument y, then plot behaves exactly the same as described in the previous paragraph.

If plot is called for an AggExResult object without specifying the second argument y, then a dendrogram plot is drawn. This variant returns an invisible dendrogram object. The showSamples argument determines whether a complete dendrogram or a dendrogram of clusters is plotted (see above). If the option horiz=TRUE is used, the dendrogram is rotated. Note that, in this case, the margin to the right of the plot may not be wide enough to accommodate long cluster/sample labels. In such a case, the figure margins have to be widened before plot is called.

If plot is called for an AggExResult object along with a matrix or data frame y, y is again interpreted as original data set. If one of the two arguments k or h is present, a clustering is cut out from the cluster hierarchy using cutree and this clustering is displayed with the original data set as described above. This variant of plot returns an invisible ExClust object containing the extracted clustering.

References

http://www.bioinf.jku.at/software/apcluster/

Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: tools:::Rd_expr_doi("10.1093/bioinformatics/btr406").

Examples

Run this code

## create two Gaussian clouds
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- rbind(cl1, cl2)

## run affinity propagation
apres <- apcluster(negDistMat(r=2), x, q=0.7, details=TRUE)

## plot information about clustering run
plot(apres)

## plot clustering result
plot(apres, x)

## perform agglomerative clustering of affinity propagation clusters
aggres1 <- aggExCluster(x=apres)

## show dendrograms
plot(aggres1)
plot(aggres1, showSamples=TRUE)

## show clustering result for 4 clusters
plot(aggres1, x, k=4)

## perform agglomerative clustering of whole data set
aggres2 <- aggExCluster(negDistMat(r=2), x)

## show dendrogram
plot(aggres2)

## show heatmap along with dendrogram
heatmap(aggres2)

## show clustering result for 2 clusters
plot(aggres2, x, k=2)

## cluster iris data set
data(iris)
apIris <- apcluster(negDistMat(r=2), iris, q=0)
plot(apIris, iris)

Run the code above in your browser using DataLab