Last chance! 50% off unlimited learning
Sale ends in
Compute silhouette information according to a given clustering in
silhouette(x, ...)
# S3 method for default
silhouette (x, dist, dmatrix, ...)
# S3 method for partition
silhouette(x, ...)
# S3 method for clara
silhouette(x, full = FALSE, subset = NULL, ...)sortSilhouette(object, ...)
# S3 method for silhouette
summary(object, FUN = mean, ...)
# S3 method for silhouette
plot(x, nmax.lab = 40, max.strlen = 5,
main = NULL, sub = NULL, xlab = expression("Silhouette width "* s[i]),
col = "gray", do.col.sort = length(col) > 1, border = 0,
cex.names = par("cex.axis"), do.n.k = TRUE, do.clus.stat = TRUE, ...)
silhouette()
returns an object, sil
, of class
silhouette
which is an sil[i,]
contains the
cluster to which i belongs as well as the neighbor cluster of i (the
cluster, not containing i, for which the average dissimilarity between its
observations and i is minimal), and the silhouette width colnames
correspondingly are
c("cluster", "neighbor", "sil_width")
.
summary(sil)
returns an object of class
summary.silhouette
, a list with components
si.summary
:numerical summary
of the
individual silhouette widths
clus.avg.widths
:numeric (rank 1) array of clusterwise
means of silhouette widths where mean = FUN
is used.
avg.width
:the total mean FUN(s)
where
s
are the individual silhouette widths.
clus.sizes
:table
of the
call
:if available, the call
creating sil
.
Ordered
:logical identical to attr(sil, "Ordered")
,
see below.
sortSilhouette(sil)
orders the rows of sil
as in the
silhouette plot, by cluster (increasingly) and decreasing silhouette
width
attr(sil, "Ordered")
is a logical indicating if sil
is
ordered as by sortSilhouette()
. In that case,
rownames(sil)
will contain case labels or numbers, and
attr(sil, "iOrd")
the ordering index vector.
an object of appropriate class; for the default
method an integer vector with x$clustering
component. Note that silhouette statistics are only defined if
a dissimilarity object inheriting from class
dist
or coercible to one. If not specified,
dmatrix
must be.
a symmetric dissimilarity matrix (dist
, which can be more efficient.
logical or number in clara
object. When a
number, say sample.int(n, size = f*n)
of the data the silhouette values are computed.
This requires daisy
) is needed internally.
a subset from 1:n
, specified instead of full
to specify the indices of the observations to be used for the silhouette
computations.
an object of class silhouette
.
further arguments passed to and from methods.
function used to summarize silhouette widths.
integer indicating the number of labels which is considered too large for single-name labeling the silhouette plot.
positive integer giving the length to which strings are truncated in silhouette plot labeling.
arguments to title
; have a
sensible non-NULL default here.
arguments passed
barplot()
; note that the default used to be col
= heat.colors(n), border = par("fg")
instead.
col
can also be a color vector of length do.col.sort
:
logical indicating if the colors col
should
be sorted “along” the silhouette; this is useful for casewise or
clusterwise coloring.
logical indicating if
logical indicating if cluster size and averages should be written right to the silhouettes.
For each observation i, the silhouette width
Put a(i) = average dissimilarity between i and all other points of the
cluster to which i belongs (if i is the only observation in
its cluster,
silhouette.default()
is now based on C code donated by Romain
Francois (the R version being still available as cluster:::silhouetteR
).
Observations with a large
Rousseeuw, P.J. (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53--65.
chapter 2 of Kaufman and Rousseeuw (1990), see
the references in plot.agnes
.
partition.object
, plot.partition
.
data(ruspini)
pr4 <- pam(ruspini, 4)
str(si <- silhouette(pr4))
(ssi <- summary(si))
plot(si) # silhouette plot
plot(si, col = c("red", "green", "blue", "purple"))# with cluster-wise coloring
si2 <- silhouette(pr4$clustering, dist(ruspini, "canberra"))
summary(si2) # has small values: "canberra"'s fault
plot(si2, nmax= 80, cex.names=0.6)
op <- par(mfrow= c(3,2), oma= c(0,0, 3, 0),
mgp= c(1.6,.8,0), mar= .1+c(4,2,2,2))
for(k in 2:6)
plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE)
mtext("PAM(Ruspini) as in Kaufman & Rousseeuw, p.101",
outer = TRUE, font = par("font.main"), cex = par("cex.main")); frame()
## the same with cluster-wise colours:
c6 <- c("tomato", "forest green", "dark blue", "purple2", "goldenrod4", "gray20")
for(k in 2:6)
plot(silhouette(pam(ruspini, k=k)), main = paste("k = ",k), do.n.k=FALSE,
col = c6[1:k])
par(op)
## clara(): standard silhouette is just for the best random subset
data(xclara)
set.seed(7)
str(xc1k <- xclara[ sample(nrow(xclara), size = 1000) ,]) # rownames == indices
cl3 <- clara(xc1k, 3)
plot(silhouette(cl3))# only of the "best" subset of 46
## The full silhouette: internally needs large (36 MB) dist object:
sf <- silhouette(cl3, full = TRUE) ## this is the same as
s.full <- silhouette(cl3$clustering, daisy(xc1k))
stopifnot(all.equal(sf, s.full, check.attributes = FALSE, tolerance = 0))
## color dependent on original "3 groups of each 1000": % __FIXME ??__
plot(sf, col = 2+ as.integer(names(cl3$clustering) ) %/% 1000,
main ="plot(silhouette(clara(.), full = TRUE))")
## Silhouette for a hierarchical clustering:
ar <- agnes(ruspini)
si3 <- silhouette(cutree(ar, k = 5), # k = 4 gave the same as pam() above
daisy(ruspini))
stopifnot(is.data.frame(di3 <- as.data.frame(si3)))
plot(si3, nmax = 80, cex.names = 0.5)
## 2 groups: Agnes() wasn't too good:
si4 <- silhouette(cutree(ar, k = 2), daisy(ruspini))
plot(si4, nmax = 80, cex.names = 0.5)
Run the code above in your browser using DataLab