find_k: Find the (estimated) number of clusters for a dendrogram using average silhouette width

Description

This function estimates the number of clusters based on the maximal average silhouette width derived from running pam on the cophenetic distance matrix of the dendrogram. The output is based on the pamk output.

Usage

find_k(dend, krange = 2:min(10, (nleaves(dend) - 1)), ...)

Arguments

dend

A dendrogram (or hclust) tree object

krange

integer vector. Numbers of clusters which are to be compared by the average silhouette width criterion. Note: average silhouette width and Calinski-Harabasz can't estimate number of clusters nc=1. If 1 is included, a Duda-Hart test is applied and 1 is estimated if this is not significant.

...

passed to pamk (the current defaults criterion="asw" and usepam=TRUE can not be changes).

Value

A pamk output. This is a list with the following components: 1) pamobject - The output of the optimal run of the pam-function. 2) nc - the optimal number of clusters. 3) crit - vector of criterion values for numbers of clusters. crit[1] is the p-value of the Duda-Hart test if 1 is in krange and diss=FALSE.

Examples

Run this code


dend <- iris[,-5] %>% dist %>% hclust %>% as.dendrogram
dend_k <- find_k(dend)
plot(dend_k)
plot(color_branches(dend, k = dend_k$nc))

library(cluster)
sil <- silhouette(dend_k$pamobject)
plot(sil)

dend <- USArrests %>% dist %>% hclust(method = "ave") %>% as.dendrogram
dend_k <- find_k(dend)
plot(dend_k)
plot(color_branches(dend, k = dend_k$nc))

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples