Learn R Programming

fpc (version 2.2-13)

fpc-package: fpc package overview


Here is a list of the main functions in package fpc. Most other functions are auxiliary functions for these.


Clustering methods


Computes DBSCAN density based clustering as introduced in Ester et al. (1996).


Mahalanobis Fixed Point Clustering, Hennig and Christlieb (2002), Hennig (2005).


Regression Fixed Point Clustering, Hennig (2003).


This fits a latent class model to data with mixed type continuous/nominal variables. Actually it calls a method for flexmix.


Clustering by merging components of a Gaussian mixture, see Hennig (2010).


ML-fit of a mixture of linear regression models, see DeSarbo and Cron (1988).

Cluster validity indexes and estimation of the number of clusters


This computes several cluster validity statistics from a clustering and a dissimilarity matrix including the Calinski-Harabasz index, the adjusted Rand index and other statistics explained in Gordon (1999) as well as several characterising measures such as average between cluster and within cluster dissimilarity and separation. See also calinhara, dudahart2 for specific indexes, and a new version cqcluster.stats that computes some more indexes and statistics used for computing them. There's also distrsimilarity, which computes within-cluster dissimilarity to the Gaussian and uniform distribution.


Estimates the number of clusters by computing the prediction strength of a clustering of a dataset into different numbers of components for various clustering methods, see Tibshirani and Walther (2005). In fact, this is more flexible than what is in the original paper, because it can use point classification schemes that work better with clustering methods other than k-means.


Estimates the number of clusters by bootstrap stability selection, see Fang and Wang (2012). This is quite flexible regarding clustering methods and point classification schemes and also allows for dissimilarity data.


This runs many clustering methods (to be specifed by the user) with many numbers of clusters on a dataset and produces standardised and comparable versions of many cluster validity indexes (see Hennig 2019, Akhanli and Hennig 2020). This is done by means of producing random clusterings on the given data, see stupidkcentroids and stupidknn. It allows to compare many clusterings based on many different potential desirable features of a clustering. print.valstat allows to compute an aggregated index with user-specified weights.

Cluster visualisation and validation


Sets of colours and symbols useful for cluster plotting.


Cluster-wise stability assessment of a clustering. Clusterings are performed on resampled data to see for every cluster of the original dataset how well this is reproduced. See Hennig (2007) for details.


Extracts variable-wise information for every cluster in order to help with cluster interpretation.


Visualisation of a clustering or grouping in data by various linear projection methods that optimise the separation between clusters, or between a single cluster and the rest of the data according to Hennig (2004) including classical methods such as discriminant coordinates. This calls the function discrproj, which is a bit more flexible but doesn't produce a plot itself.


Plots and diagnostics for assessing modality of Gaussian mixtures, see Ray and Lindsay (2005).


Plots to diagnose component separation in Gaussian mixtures, see Hennig (2010).


Local shape matrix, can be used for finding clusters in connection with function ics in package ICS, see Hennig's discussion and rejoinder of Tyler et al. (2009).

Useful wrapper functions for clustering methods


This and other "CBI"-functions (see the kmeansCBI-help page) are unified wrappers for various clustering methods in R that may be useful because they do in one step for what you normally may need to do a bit more in R (for example fitting a Gaussian mixture with noise component in package mclust).


This calls kmeans for the k-means clustering method and includes estimation of the number of clusters and finding an optimal solution from several starting points.


This calls pam and clara for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.


