Here is a list of the main functions in package fpc. Most other functions are auxiliary functions for these.
Computes DBSCAN density based clustering as introduced in Ester et al. (1996).
Mahalanobis Fixed Point Clustering, Hennig and Christlieb (2002), Hennig (2005).
Regression Fixed Point Clustering, Hennig (2003).
This fits a latent class model to
data with mixed type continuous/nominal variables. Actually it
calls a method for flexmix
.
Clustering by merging components of a Gaussian mixture, see Hennig (2010).
ML-fit of a mixture of linear regression models, see DeSarbo and Cron (1988).
This computes several cluster validity
statistics from a clustering and a dissimilarity matrix including
the Calinski-Harabasz index, the adjusted Rand index and other
statistics explained in Gordon (1999) as well as several
characterising
measures such as average between cluster and within cluster
dissimilarity and separation. See also calinhara
,
dudahart2
for specific indexes, and a new version
cqcluster.stats
that computes some more indexes and
statistics used for computing them. There's also
distrsimilarity
, which computes within-cluster
dissimilarity to the Gaussian and uniform distribution.
Estimates the number of clusters by computing the prediction strength of a clustering of a dataset into different numbers of components for various clustering methods, see Tibshirani and Walther (2005). In fact, this is more flexible than what is in the original paper, because it can use point classification schemes that work better with clustering methods other than k-means.
Estimates the number of clusters by bootstrap stability selection, see Fang and Wang (2012). This is quite flexible regarding clustering methods and point classification schemes and also allows for dissimilarity data.
This runs many clustering methods (to be
specifed by the user) with many numbers of clusters on a dataset
and produces standardised and comparable versions of many cluster
validity indexes (see Hennig 2019, Akhanli and Hennig 2020).
This is done by means of
producing random clusterings on the given data, see
stupidkcentroids
and stupidknn
. It
allows to compare many
clusterings based on many different potential desirable features
of a clustering. print.valstat
allows to compute an
aggregated index with user-specified weights.
Sets of colours and symbols useful for cluster plotting.
Cluster-wise stability assessment of a clustering. Clusterings are performed on resampled data to see for every cluster of the original dataset how well this is reproduced. See Hennig (2007) for details.
Extracts variable-wise information for every cluster in order to help with cluster interpretation.
Visualisation of a clustering or grouping in data
by various linear projection methods that optimise the separation
between clusters, or between a single cluster and the rest of the
data according to Hennig (2004) including classical methods such
as discriminant coordinates. This calls the function
discrproj
, which is a bit more flexible but doesn't
produce a plot itself.
Plots and diagnostics for assessing modality of Gaussian mixtures, see Ray and Lindsay (2005).
Plots to diagnose component separation in Gaussian mixtures, see Hennig (2010).
Local shape matrix, can be used for finding
clusters in connection with function ics
in package
ICS
, see Hennig's
discussion and rejoinder of Tyler et al. (2009).
This and other "CBI"-functions (see the
kmeansCBI
-help page) are unified wrappers for
various clustering methods in R that may be useful because they do
in one step for what you normally may need to do a bit more in R
(for example fitting a Gaussian mixture with noise component in
package mclust).
This calls kmeans
for the k-means
clustering method and includes estimation of the number of
clusters and finding an optimal solution from several starting
points.
This calls pam
and
clara
for the partitioning around medoids
clustering method (Kaufman and Rouseeuw, 1990) and includes two
different ways of estimating the number of clusters.
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822
DeSarbo, W. S. and Cron, W. L. (1988) A maximum likelihood methodology for clusterwise linear regression, Journal of Classification 5, 249-282.
Ester, M., Kriegel, H.-P., Sander, J. and Xu, X. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).
Fang, Y. and Wang, J. (2012) Selection of the number of clusters via the bootstrap method. Computational Statistics and Data Analysis, 56, 468-477.
Gordon, A. D. (1999) Classification, 2nd ed. Chapman and Hall.
Hennig, C. (2003) Clusters, outliers and regression: fixed point clusters, Journal of Multivariate Analysis 86, 183-212.
Hennig, C. (2004) Asymmetric linear dimension reduction for classification. Journal of Computational and Graphical Statistics, 13, 930-945 .
Hennig, C. (2005) Fuzzy and Crisp Mahalanobis Fixed Point Clusters, in Baier, D., Decker, R., and Schmidt-Thieme, L. (eds.): Data Analysis and Decision Support. Springer, Heidelberg, 47-56.
Hennig, C. (2007) Cluster-wise assessment of cluster stability. Computational Statistics and Data Analysis, 52, 258-271.
Hennig, C. (2010) Methods for merging Gaussian mixture components, Advances in Data Analysis and Classification, 4, 3-34.
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282
Hennig, C. and Christlieb, N. (2002) Validating visual clusters in large datasets: Fixed point clusters of spectral features, Computational Statistics and Data Analysis 40, 723-739.
Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.
Ray, S. and Lindsay, B. G. (2005) The Topography of Multivariate Normal Mixtures, Annals of Statistics, 33, 2042-2065.
Tibshirani, R. and Walther, G. (2005) Cluster Validation by Prediction Strength, Journal of Computational and Graphical Statistics, 14, 511-528.