Compute hierarchical or kmeans cluster analysis and return the group assignment for each observation as vector.
cluster_analysis(
x,
n_clusters = NULL,
method = c("hclust", "kmeans"),
distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"),
agglomeration = c("ward", "ward.D", "ward.D2", "single", "complete", "average",
"mcquitty", "median", "centroid"),
iterations = 20,
algorithm = c("Hartigan-Wong", "Lloyd", "MacQueen"),
force = TRUE,
package = c("NbClust", "mclust"),
verbose = TRUE
)
A data frame.
Number of clusters used for the cluster solution. By default,
the number of clusters to extract is determined by calling n_clusters
.
Method for computing the cluster analysis. By default ("hclust"
), a
hierarchical cluster analysis, will be computed. Use "kmeans"
to
compute a kmeans cluster analysis. You can specify the initial letters only.
Distance measure to be used when method = "hclust"
(for hierarchical
clustering). Must be one of "euclidean"
, "maximum"
, "manhattan"
,
"canberra"
, "binary"
or "minkowski"
. See dist
.
If is method = "kmeans"
this argument will be ignored.
Agglomeration method to be used when method = "hclust"
(for hierarchical
clustering). This should be one of "ward"
, "single"
, "complete"
, "average"
,
"mcquitty"
, "median"
or "centroid"
. Default is "ward"
(see hclust
).
If method = "kmeans"
this argument will be ignored.
Maximum number of iterations allowed. Only applies, if
method = "kmeans"
. See kmeans
for details on this argument.
Algorithm used for calculating kmeans cluster. Only applies, if
method = "kmeans"
. May be one of "Hartigan-Wong"
(default),
"Lloyd"
(used by SPSS), or "MacQueen"
. See kmeans
for details on this argument.
Logical, if TRUE
, ordered factors (ordinal variables) are
converted to numeric values, while character vectors and factors are converted
to dummy-variables (numeric 0/1) and are included in the cluster analysis.
If FALSE
, factors and character vectors are removed before computing
the cluster analysis.
These are the packages from which methods are used to determine the number of clusters. Can be "all"
or a vector containing "NbClust"
, "mclust"
, "cluster"
and "M3C"
.
Toggle off warnings.
The group classification for each observation as vector. The
returned vector includes missing values, so it has the same length
as nrow(x)
.
The print()
and plot()
methods show the (standardized)
mean value for each variable within each cluster. Thus, a higher absolute
value indicates that a certain variable characteristic is more pronounced
within that specific cluster (as compared to other cluster groups with lower
absolute mean values).
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2014) cluster: Cluster Analysis Basics and Extensions. R package.
n_clusters
to determine the number of clusters to extract, cluster_discrimination
to determine the accuracy of cluster group classification and check_clusterstructure
to check suitability of data for clustering.
# NOT RUN {
# Hierarchical clustering of mtcars-dataset
groups <- cluster_analysis(iris[, 1:4], 3)
groups
# K-means clustering of mtcars-dataset, auto-detection of cluster-groups
# }
# NOT RUN {
groups <- cluster_analysis(iris[, 1:4], method = "k")
groups
# }
Run the code above in your browser using DataLab