sjc.cluster: Compute hierarchical or kmeans cluster analysis

Description

Compute hierarchical or kmeans cluster analysis and return the group association for each observation as vector.

Usage

sjc.cluster(data, groupcount = NULL, method = c("hclust", "kmeans"), distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), agglomeration = c("ward", "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), iter.max = 20, algorithm = c("Hartigan-Wong", "Lloyd", "MacQueen"))

Arguments

data

data.frame with variables that should be used for the cluster analysis.

groupcount

amount of groups (clusters) used for the cluster solution. May also be a set of initial (distinct) cluster centres, in case method = "kmeans" (see kmeans for details on centers argument). If groupcount = NULL and method = "kmeans", the optimal amount of clusters is calculated using the gap statistics (see sjc.kgap). For method = "hclust", groupcount needs to be specified. Following functions may be helpful for estimating the amount of clusters:

Use sjc.elbow to determine the group-count depending on the elbow-criterion.
If method = "kmeans", use sjc.kgap to determine the group-count according to the gap-statistic.
If method = "hclust" (hierarchical clustering, default), use sjc.dend to inspect different cluster group solutions.
Use sjc.grpdisc to inspect the goodness of grouping (accuracy of classification).

method

method for computing the cluster analysis. By default ("kmeans"), a kmeans cluster analysis will be computed. Use "hclust" to compute a hierarchical cluster analysis. You can specify the initial letters only.

distance

distance measure to be used when method = "hclust" (for hierarchical clustering). Must be one of "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski". See dist. If is method = "kmeans" this argument will be ignored.

agglomeration

agglomeration method to be used when method = "hclust" (for hierarchical clustering). This should be one of "ward", "single", "complete", "average", "mcquitty", "median" or "centroid". Default is "ward" (see hclust). If method = "kmeans" this argument will be ignored. See 'Note'.

iter.max

maximum number of iterations allowed. Only applies, if method = "kmeans". See kmeans for details on this argument.

algorithm

algorithm used for calculating kmeans cluster. Only applies, if method = "kmeans". May be one of "Hartigan-Wong" (default), "Lloyd" (used by SPSS), or "MacQueen". See kmeans for details on this argument.

Value

The group classification for each observation as vector. This group classification can be used for sjc.grpdisc-function to check the goodness of classification. The returned vector includes missing values, so it can be appended to the original data frame data.

References

Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.

Examples

Run this code

# Hierarchical clustering of mtcars-dataset
groups <- sjc.cluster(mtcars, 5)

# K-means clustering of mtcars-dataset
groups <- sjc.cluster(mtcars, 5, method="k")

Run the code above in your browser using DataLab