Compute hierarchical or kmeans cluster analysis and return the group association for each observation as vector.
sjc.cluster(data, groupcount = NULL, method = c("hclust", "kmeans"),
distance = c("euclidean", "maximum", "manhattan", "canberra", "binary",
"minkowski"), agglomeration = c("ward", "ward.D", "ward.D2", "single",
"complete", "average", "mcquitty", "median", "centroid"), iter.max = 20,
algorithm = c("Hartigan-Wong", "Lloyd", "MacQueen"))
A data frame with variables that should be used for the cluster analysis.
Amount of groups (clusters) used for the cluster solution. May also be
a set of initial (distinct) cluster centres, in case method = "kmeans"
(see kmeans
for details on centers
argument).
If groupcount = NULL
and method = "kmeans"
, the optimal
amount of clusters is calculated using the gap statistics (see
sjc.kgap
). For method = "hclust"
, groupcount
needs to be specified. Following functions may be helpful for estimating
the amount of clusters:
Use sjc.elbow
to determine the group-count depending on the elbow-criterion.
If method = "kmeans"
, use sjc.kgap
to determine the group-count according to the gap-statistic.
If method = "hclust"
(hierarchical clustering, default), use sjc.dend
to inspect different cluster group solutions.
Use sjc.grpdisc
to inspect the goodness of grouping (accuracy of classification).
Method for computing the cluster analysis. By default ("kmeans"
), a
kmeans cluster analysis will be computed. Use "hclust"
to
compute a hierarchical cluster analysis. You can specify the
initial letters only.
Distance measure to be used when method = "hclust"
(for hierarchical
clustering). Must be one of "euclidean"
, "maximum"
, "manhattan"
,
"canberra"
, "binary"
or "minkowski"
. See dist
.
If is method = "kmeans"
this argument will be ignored.
Agglomeration method to be used when method = "hclust"
(for hierarchical
clustering). This should be one of "ward"
, "single"
, "complete"
, "average"
,
"mcquitty"
, "median"
or "centroid"
. Default is "ward"
(see hclust
).
If method = "kmeans"
this argument will be ignored. See 'Note'.
Maximum number of iterations allowed. Only applies, if
method = "kmeans"
. See kmeans
for details on this argument.
Algorithm used for calculating kmeans cluster. Only applies, if
method = "kmeans"
. May be one of "Hartigan-Wong"
(default),
"Lloyd"
(used by SPSS), or "MacQueen"
. See kmeans
for details on this argument.
The group classification for each observation as vector. This group
classification can be used for sjc.grpdisc
-function to
check the goodness of classification.
The returned vector includes missing values, so it can be appended
to the original data frame data
.
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.
# NOT RUN {
# Hierarchical clustering of mtcars-dataset
groups <- sjc.cluster(mtcars, 5)
# K-means clustering of mtcars-dataset
groups <- sjc.cluster(mtcars, 5, method="k")
# }
Run the code above in your browser using DataLab