clustCoDa: Cluster analysis for compositional data

Description

Clustering in orthonormal coordinates or by using the Aitchison distance

Usage

clustCoDa(
  x,
  k = NULL,
  method = "Mclust",
  scale = "robust",
  transformation = "pivotCoord",
  distMethod = NULL,
  iter.max = 100,
  vals = TRUE,
  alt = NULL,
  bic = NULL,
  verbose = TRUE
)
# S3 method for clustCoDa
plot(
  x,
  y,
  ...,
  normalized = FALSE,
  which.plot = "clusterMeans",
  measure = "silwidths"
)

Value

all relevant information such as cluster centers, cluster memberships, and cluster statistics.

Arguments

x: compositional data represented as a data.frame
k: number of clusters
method: clustering method. One of Mclust, cmeans, kmeansHartigan, cmeansUfcl, pam, clara, fanny, ward.D2, single, hclustComplete, average, mcquitty, median, centroid
scale: if orthonormal coordinates should be normalized.
transformation: default are the isometric logratio coordinates. Can only used when distMethod is not Aitchison.
distMethod: Distance measure to be used. If “Aitchison”, then transformation should be “identity”.
iter.max: parameter if kmeans is chosen. The maximum number of iterations allowed
vals: if cluster validity measures should be calculated
alt: a known partitioning can be provided (for special cluster validity measures)
bic: if TRUE then the BIC criteria is evaluated for each single cluster as validity measure
verbose: if TRUE additional print output is provided
y: the y coordinates of points in the plot, optional if x is an appropriate structure.
...: additional parameters for print method passed through
normalized: results gets normalized before plotting. Normalization is done by z-transformation applied on each variable.
which.plot: currently the only plot. Plot of cluster centers.
measure: cluster validity measure to be considered for which.plot equals “partMeans”

Author

Matthias Templ (accessing the basic features of hclust, Mclust, kmeans, etc. that are all written by others)

Details

The compositional data set is either internally represented by orthonormal coordiantes before a cluster algorithm is applied, or - depending on the choice of parameters - the Aitchison distance is used.

References

M. Templ, P. Filzmoser, C. Reimann. Cluster analysis applied to regional geochemical data: Problems and possibilities. Applied Geochemistry, 23 (8), 2198--2213, 2008

Templ, M., Filzmoser, P., Reimann, C. (2008) Cluster analysis applied to regional geochemical data: Problems and possibilities, Applied Geochemistry, 23 (2008), pages 2198 - 2213.

Examples

Run this code

data(expenditures)
x <- expenditures
rr <- clustCoDa(x, k=6, scale = "robust", transformation = "pivotCoord")
rr2 <- clustCoDa(x, k=6, distMethod = "Aitchison", scale = "none", 
                 transformation = "identity")
rr3 <- clustCoDa(x, k=6, distMethod = "Aitchison", method = "single",
                 transformation = "identity", scale = "none")
                 
if (FALSE) {
require(reshape2)
plot(rr)
plot(rr, normalized = TRUE)
plot(rr, normalized = TRUE, which.plot = "partMeans")
}

Run the code above in your browser using DataLab