sjc.qclus(data, groupcount = NULL, groups = NULL, method = c("kmeans", "hclust"), distance = c("euclidean", "maximum", "manhattan", "canberra", "binary", "minkowski"), agglomeration = c("ward", "ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid"), iter.max = 20, algorithm = c("Hartigan-Wong", "Lloyd", "MacQueen"), show.accuracy = FALSE, title = NULL, axis.labels = NULL, wrap.title = 40, wrap.labels = 20, wrap.legend.title = 20, wrap.legend.labels = 20, facet.grid = FALSE, geom.colors = "Paired", geom.size = 0.5, geom.spacing = 0.1, show.legend = TRUE, show.grpcnt = TRUE, legend.title = NULL, legend.labels = NULL, coord.flip = FALSE, reverse.axis = FALSE, prnt.plot = TRUE)
data.frame
with variables that should be used for the
cluster analysis.method = "kmeans"
(see kmeans
for details on centers
argument).
If groupcount = NULL
and method = "kmeans"
, the optimal
amount of clusters is calculated using the gap statistics (see
sjc.kgap
). For method = "hclust"
, groupcount
needs to be specified. Following functions may be helpful for estimating
the amount of clusters:
sjc.elbow
to determine the group-count depending on the elbow-criterion.
method = "kmeans"
, use sjc.kgap
to determine the group-count according to the gap-statistic.
method = "hclust"
(hierarchical clustering, default), use sjc.dend
to inspect different cluster group solutions.
sjc.grpdisc
to inspect the goodness of grouping (accuracy of classification).
NULL
and will be
ignored. However, to plot existing cluster groups, specify groupcount
and groups
. groups
is a vector of same length as
nrow(data)
and indicates the group classification of the cluster
analysis. The group classification can be computed with the
sjc.cluster
function. See 'Examples'."kmeans"
), a
kmeans cluster analysis will be computed. Use "hclust"
to
compute a hierarchical cluster analysis. You can specify the
initial letters only.method = "hclust"
(for hierarchical
clustering). Must be one of "euclidean"
, "maximum"
, "manhattan"
,
"canberra"
, "binary"
or "minkowski"
. See dist
.
If is method = "kmeans"
this argument will be ignored.method = "hclust"
(for hierarchical
clustering). This should be one of "ward"
, "single"
, "complete"
, "average"
,
"mcquitty"
, "median"
or "centroid"
. Default is "ward"
(see hclust
).
If method = "kmeans"
this argument will be ignored. See 'Note'.method = "kmeans"
. See kmeans
for details on this argument.method = "kmeans"
. May be one of "Hartigan-Wong"
(default),
"Lloyd"
(used by SPSS), or "MacQueen"
. See kmeans
for details on this argument.TRUE
, the sjc.grpdisc
function will be called,
which computes a linear discriminant analysis on the classified cluster groups and plots a
bar graph indicating the goodness of classification for each group.title = ""
, no title is printed.TRUE
to arrange the lay out of of multiple plots
in a grid of an integrated single plot. This argument calls
facet_wrap
or facet_grid
to arrange plots. Use plot_grid
to plot multiple plot-objects
as an arranged grid with grid.arrange
.sjp.grpfrq
.TRUE
, and depending on plot type and
function, a legend is added to the plot.TRUE
(default), the count within each cluster group is added to the
legend labels (e.g. "Group 1 (n=87)"
).TRUE
, the x and y axis are swapped.TRUE
, the values on the x-axis are reversed.TRUE
(default), plots the results as graph. Use FALSE
if you don't
want to plot any graphs. In either case, the ggplot-object will be returned as value.data
: the used data frame for plotting,
plot
: the ggplot object,
groupcount
: the number of found cluster (as calculated by sjc.kgap
)
classification
: the group classification (as calculated by sjc.cluster
), including missing values, so this vector can be appended to the original data frame.
accuracy
: the accuracy of group classification (as calculated by sjc.grpdisc
).
method = "kmeans"
, this function first determines the optimal group count via gap statistics (unless argument groupcount
is specified), using the sjc.kgap
function.
sjc.cluster
function to determine the cluster groups.
data
are scaled and centered. The mean value of these z-scores within each cluster group is calculated to see how certain characteristics (variables) in a cluster group differ in relation to other cluster groups.
This method can also be used to plot existing cluster solution as graph witouth computing
a new cluster analysis. See argument groups
for more details.
## Not run:
# # k-means clustering of mtcars-dataset
# sjc.qclus(mtcars)
#
# # k-means clustering of mtcars-dataset with 4 pre-defined
# # groups in a faceted panel
# sjc.qclus(airquality, groupcount = 4, facet.grid = TRUE)## End(Not run)
# k-means clustering of airquality data
# and saving the results. most likely, 3 cluster
# groups have been found (see below).
airgrp <- sjc.qclus(airquality)
# "re-plot" cluster groups, without computing
# new k-means cluster analysis.
sjc.qclus(airquality, groupcount = 3, groups = airgrp$classification)
Run the code above in your browser using DataLab