Learn R Programming

apcluster (version 1.4.11)

apcluster: Affinity Propagation

Description

Runs affinity propagation clustering

Usage

# S4 method for matrix,missing
apcluster(s, x, p=NA, q=NA, maxits=1000,
    convits=100, lam=0.9, includeSim=FALSE, details=FALSE,
    nonoise=FALSE, seed=NA)
# S4 method for dgTMatrix,missing
apcluster(s, x, p=NA, q=NA, maxits=1000,
    convits=100, lam=0.9, includeSim=FALSE, details=FALSE,
    nonoise=FALSE, seed=NA)
# S4 method for sparseMatrix,missing
apcluster(s, x, ...)
# S4 method for Matrix,missing
apcluster(s, x, ...)
# S4 method for character,ANY
apcluster(s, x, p=NA, q=NA, maxits=1000,
    convits=100, lam=0.9, includeSim=TRUE, details=FALSE,
    nonoise=FALSE, seed=NA, ...)
# S4 method for function,ANY
apcluster(s, x, p=NA, q=NA, maxits=1000,
    convits=100, lam=0.9, includeSim=TRUE, details=FALSE,
    nonoise=FALSE, seed=NA, ...)

Value

Upon successful completion, the function returns an

APResult object.

Arguments

s

an \(l\times l\) similarity matrix or a similarity function either specified as the name of a package-provided similarity function as character string or a user provided function object. s may also be a sparse matrix according to the Matrix package. Internally, apcluster uses the dgTMatrix class; all other sparse matrices are cast to this class (if possible, otherwise the function quits with an error). If s is any other object of class Matrix, s is cast to a regular matrix internally (if possible, otherwise the function quits with an error).

x

input data to be clustered; if x is a matrix or data frame, rows are interpreted as samples and columns are interpreted as features; apart from matrices or data frames, x may be any other structured data type that contains multiple data items - provided that an appropriate length function is available that returns the number of items

p

input preference; can be a vector that specifies individual preferences for each data point. If scalar, the same value is used for all data points. If NA, exemplar preferences are initialized according to the distribution of non-Inf values in s. How this is done is controlled by the parameter q.

q

if p=NA, exemplar preferences are initialized according to the distribution of non-Inf values in s. If q=NA, exemplar preferences are set to the median of non-Inf values in s. If q is a value between 0 and 1, the sample quantile with threshold q is used, whereas q=0.5 again results in the median.

maxits

maximal number of iterations that should be executed

convits

the algorithm terminates if the examplars have not changed for convits iterations

lam

damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur

includeSim

if TRUE, the similarity matrix (either computed internally or passed via the s argument) is stored to the slot sim of the returned APResult object. The default is FALSE if apcluster has been called for a similarity matrix, otherwise the default is TRUE.

details

if TRUE, more detailed information about the algorithm's progress is stored in the output object (see APResult)

nonoise

apcluster adds a small amount of noise to s to prevent degenerate cases; if TRUE, this is disabled

seed

for reproducibility, the seed of the random number generator can be set to a fixed value before adding noise (see above), if NA, the seed remains unchanged

...

for the methods with signatures character,ANY and function,ANY, all other arguments are passed to the selected similarity function as they are; for the methods with signatures Matrix,missing and sparseMatrix,missing, further arguments are passed on to the apcluster methods with signatures Matrix,missing and dgTMatrix,missing, respectively.

Author

Ulrich Bodenhofer, Andreas Kothmeier, Johannes Palme & Chrats Melkonian apcluster@bioinf.jku.at

Details

Affinity Propagation clusters data using a set of real-valued pairwise data point similarities as input. Each cluster is represented by a cluster center data point (the so-called exemplar). The method is iterative and searches for clusters maximizing an objective function called net similarity.

When called with a similarity matrix as input (which may also be a sparse matrix according to the Matrix package), the function performs AP clustering. When called with the name of a package-provided similarity function or a user-provided similarity function object and input data, the function first computes the similarity matrix before performing AP clustering. The similarity matrix is returned for later use as part of the APResult object depending on whether includeSim was set to TRUE (see argument description above).

Apart from minor adaptations and optimizations, the AP clustering functionality of the function apcluster is largely analogous to Frey's and Dueck's Matlab code (see https://psi.toronto.edu/research/affinity-propagation-clustering-by-message-passing/).

The new argument q allows for better controlling the number of clusters without knowing the distribution of similarity values. A meaningful range for the parameter p can be determined using the function preferenceRange. Alternatively, a certain fixed number of clusters may be desirable. For this purpose, the function apclusterK is available.

References

http://www.bioinf.jku.at/software/apcluster/

Frey, B. J. and Dueck, D. (2007) Clustering by passing messages between data points. Science 315, 972-976. DOI: tools:::Rd_expr_doi("10.1126/science.1136800").

Bodenhofer, U., Kothmeier, A., and Hochreiter, S. (2011) APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463-2464. DOI: tools:::Rd_expr_doi("10.1093/bioinformatics/btr406").

See Also

APResult, show-methods, plot-methods, labels-methods, preferenceRange, apclusterL-methods, apclusterK

Examples

Run this code
## create two Gaussian clouds
cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- rbind(cl1, cl2)

## compute similarity matrix and run affinity propagation 
## (p defaults to median of similarity)
apres <- apcluster(negDistMat(r=2), x, details=TRUE)

## show details of clustering results
show(apres)

## plot clustering result
plot(apres, x)

## plot heatmap
heatmap(apres)

## run affinity propagation with default preference of 10% quantile
## of similarities; this should lead to a smaller number of clusters
## reuse similarity matrix from previous run
apres <- apcluster(s=apres@sim, q=0.1)
show(apres)
plot(apres, x)

## now try the same with RBF kernel
sim <- expSimMat(x, r=2)
apres <- apcluster(s=sim, q=0.2)
show(apres)
plot(apres, x)

## create sparse similarity matrix
cl1 <- cbind(rnorm(20, 0.2, 0.05), rnorm(20, 0.8, 0.06))
cl2 <- cbind(rnorm(20, 0.7, 0.08), rnorm(20, 0.3, 0.05))
x <- rbind(cl1, cl2)

sim <- negDistMat(x, r=2)
ssim <- as.SparseSimilarityMatrix(sim, lower=-0.2)

## run apcluster() on the sparse similarity matrix
apres <- apcluster(ssim, q=0)
apres

Run the code above in your browser using DataLab