kgs: KGS Measure for Pruning Hierarchical Clusters
Description
Computes the Kelley-Gardner-Sutcliffe penalty function for a
hierarchical cluster tree.
Usage
kgs (cluster, diss, alpha=1, maxclust=NULL)
Arguments
cluster
object of class hclust or twins.
diss
object of class dissimilarity or dist.
alpha
weight for number of clusters.
maxclust
maximum number of clusters for which to compute
measure.
Value
Vector of the penalty function for trees of size 2:maxclust.
The names of vector elements are the respective numbers of clusters.
Details
Kelley et al. (see reference) proposed a method that can help decide
where to prune a hierarchical cluster tree. At any level of the
tree the mean across all clusters of the mean within clusters of the
dissimilarity measure is calculated. After normalizing, the number
of clusters times alpha is added. The minimum of this function
corresponds to the suggested pruning size.
The current implementation has complexity O(n*n*maxclust), thus
very slow with large n. For improvements, at least it should only
calculate the spread for clusters that are split at each level,
rather than over again for all.
References
Kelley, L.A., Gardner, S.P., Sutcliffe, M.J. (1996) An automated
approach for clustering an ensemble of NMR-derived protein structures
into conformationally-related subfamilies, Protein Engineering,
9, 1063-1065.