cim: Clustered Image Maps (CIMs) ("heat maps")

Description

This function generates color-coded Clustered Image Maps (CIMs) ("heat maps") to represent "high-dimensional" data sets.

Usage

## S3 method for class 'default':
cim(mat, breaks, col = jet.colors, 
    distfun = dist, hclustfun = hclust,
    dendrogram = c("both", "row", "column", "none"),
    labRow = NULL, labCol = NULL,
    ColSideColors = NULL, RowSideColors = NULL,		 
    symkey = TRUE, keysize = 1, zoom = FALSE, 
    main = NULL, xlab = NULL, ylab = NULL, 
    cexRow = min(1, 0.2 + 1/log10(nr)), 
    cexCol = min(1, 0.2 + 1/log10(nc)), 
    margins = c(5, 5), lhei = NULL, lwid = NULL, ...)
			
## S3 method for class 'rcc':
cim(object, comp = 1, X.names = NULL, Y.names = NULL, \ldots)

## S3 method for class 'spls':
cim(object, comp = 1, X.names = NULL, Y.names = NULL, 
    keep.var = TRUE, \ldots)

## S3 method for class 'pls':
cim(object, comp = 1, X.names = NULL, Y.names = NULL, \ldots)

Arguments

mat

numeric matrix of values to be plotted.

object

object of class inheriting from "rcc", "pls" or "spls".

comp

atomic or vector of positive integers. The components to adequately account for the data association. Defaults to comp = 1.

X.names, Y.names

character vector containing the names of $X$- and $Y$-variables.

keep.var

boolean. If TRUE only the variables with loadings not zero are plotted (as selected by spls). Defaults to TRUE.

distfun

function used to compute the distance (dissimilarity) between both rows and columns. Defaults to dist.

breaks

(optional) either a numeric vector indicating the splitting points for binning mat into colors, or a integer number of break points to be used, in which case the break points will be spaced equally between min(mat) an

col

a character string specifying the colors function to use: terrain.colors, topo.colors, rainbow<

hclustfun

function used to compute the hierarchical clustering for both rows and columns. Defaults to hclust. Should take as argument a result of distfun and return an object to which

dendrogram

character string indicating whether to draw "none", "row", "column" or "both" dendrograms. Defaults to "both".

labRow

character vectors with row labels to use. Defaults to rownames(mat).

labCol

character vectors with column labels to use. Defaults to colnames(mat).

ColSideColors

(optional) character vector of length ncol(mat) containing the color names for a horizontal side bar that may be used to annotate the columns of mat.

RowSideColors

(optional) character vector of length nrow(mat) containing the color names for a vertical side bar that may be used to annotate the rows of mat.

symkey

boolean indicating whether the color key should be made symmetric about 0. Defaults to TRUE.

keysize

positive numeric value indicating the size of the color key.

zoom

logical. Whether to use zoom for interactively zooming-out. See Details.

main, xlab, ylab

main, $x$- and $y$-axis titles; defaults to none.

cexRow, cexCol

positive numbers, used as cex.axis in for the row or column axis labeling. The defaults currently only use number of rows or columns, respectively.

margins

numeric vector of length two containing the margins (see par(mar)) for column and row names respectively.

lhei, lwid

arguments passed to layout to divide the device up into two rows and two columns, with the row-heights lhei and the column-widths lwid.

...

arguments passed to cim.default.

Value

A list containing the following components:
simMatthe similarity matrix used by cim.
rowIndrow index permutation vectors as returned by order.dendrogram.
colIndcolumn index permutation vectors as returned by order.dendrogram.
ddr, ddcobject of class "dendrogram" which describes the row and column trees produced by cim.
labRow, labColcharacter vectors with row and column labels used.

encoding

latin1

Details

One matrix Clustered Image Map (default method) is a 2-dimensional visualization of a real-valued matrix (basically image(t(mat))) with a dendrogram added to the left side and to the top. The rows and columns are reordered according to some hierarchical clustering method to identify interesting patterns. By default the used clustering method for rows and columns is the complete linkage method and the used distance measure is the distance euclidean. In rcc method, the matrix mat is created where element $(j,k)$ is the scalar product value between every pairs of vectors in dimension length(comp) representing the variables $X_j$ and $Y_k$ on the axis defined by $Z_i$ with $i$ in comp, where $Z_i$ is the equiangular vector between the $i$-th $X$ and $Y$ canonical variate. In spls, if object$mode is regression, the element $(j,k)$ of the similarity matrix mat is given by the scalar product value between every pairs of vectors in dimension length(comp) representing the variables $X_j$ and $Y_k$ on the axis defined by $U_i$ with $i$ in comp, where $U_i$ is the $i$-th $X$ variate. If object$mode is canonical then $X_j$ and $Y_k$ are represented on the axis defined by $U_i$ and $V_i$ respectively. For visualization of "high-dimensional" data sets, a nice zooming tool was created. zoom=TRUE open a new device, one for CIM, one for zoom-out region and define an interactive `zoom' process: click two points at imagen map region by pressing the first mouse button. It then draws a rectangle around the selected region and zoom-out this at new device. The process can be repeated to zoom-out other regions of interest. The zoom process is terminated by clicking the second button and selecting 'Stop' from the menu, or from the 'Stop' menu on the graphics window.

References

Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceeding of the National Academy of Sciences of the USA 95, 14863-14868. Weinstein, J. N., Myers, T. G., O'Connor, P. M., Friend, S. H., Fornace Jr., A. J., Kohn, K. W., Fojo, T., Bates, S. E., Rubinstein, L. V., Anderson, N. L., Buolamwini, J. K., van Osdol, W. W., Monks, A. P., Scudiero, D. A., Sausville, E. A., Zaharevitz, D. W., Bunow, B., Viswanadhan, V. N., Johnson, G. S., Wittes, R. E. and Paull, K. D. (1997). An information-intensive approach to the molecular pharmacology of cancer. Science 275, 343-349.

Examples

Run this code

## default method
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene

cim(cor(X, Y), dendrogram = "none")

## CIM representation for objects of class 'rcc'
nutri.res <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)

dends <- cim(nutri.res, comp = 1:3, xlab = "genes", 
             ylab = "lipids", margins = c(5, 6))

op <- par(mar = c(5, 4, 4, 4), cex = 0.8)			 
plot(dends$ddr, axes = FALSE, horiz = TRUE)
par(op)

## interactive 'zoom' 
cim(nutri.res, comp = 1:3, zoom = TRUE)
## select the region and "see" the zoom-out region

## CIM representation for objects of class 'spls'
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic

toxicity.spls <- spls(X, Y, ncomp = 3, keepX = c(50, 50, 50), 
                      keepY = c(10, 10, 10))

cim(toxicity.spls, comp = 1:3)

Run the code above in your browser using DataLab