Learn R Programming

factoextra (version 1.0.3)

fviz_cluster: Visualize Clustering Results

Description

Provides ggplot2-based elegant visualization of partitioning methods including kmeans [stats package]; pam, clara and fanny [cluster package]; dbscan [fpc package]; Mclust [mclust package]; HCPC [FactoMineR]; hkmeans [factoextra]. Observations are represented by points in the plot, using principal components if ncol(data) > 2. An ellipse is drawn around each cluster.

Usage

fviz_cluster(object, data = NULL, stand = TRUE, geom = c("point", "text"), repel = FALSE, show.clust.cent = TRUE, frame = TRUE, frame.type = "convex", frame.level = 0.95, frame.alpha = 0.2, pointsize = 2, labelsize = 4, title = "Cluster plot", jitter = list(what = "label", width = NULL, height = NULL), outlier.color = "black", outlier.shape = 19)

Arguments

object
an object of class "partition" created by the functions pam(), clara() or fanny() in cluster package; "kmeans" [in stats package]; "dbscan" [in fpc package]; "Mclust" [in mclust]; "hkmeans", "eclust" [in factoextra]. Possible value are also any list object with data and cluster components (e.g.: object = list(data = mydata, cluster = myclust)).
data
the data that has been used for clustering. Required only when object is a class of kmeans or dbscan.
stand
logical value; if TRUE, data is standardized before principal component analysis
geom
a text specifying the geometry to be used for the graph. Allowed values are the combination of c("point", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") to show both types.
repel
a boolean, whether to use ggrepel to avoid overplotting text labels or not.
show.clust.cent
logical; if TRUE, shows cluster centers
frame
logical value; if TRUE, draws outline around points of each cluster
frame.type
Character specifying frame type. Possible values are 'convex' or types supporeted by ggplot2::stat_ellipse including one of c("t", "norm", "euclid").
frame.level
Passed for ggplot2::stat_ellipse 's level. Ignored in 'convex'. Default value is 0.95.
frame.alpha
Alpha for frame specifying the transparency level of fill color.
pointsize
the size of points
labelsize
font size for the labels
title
the title of the graph
jitter
a parameter used to jitter the points in order to reduce overplotting. It's a list containing the objects what, width and height (i.e jitter = list(what, width, height)).
  • what: the element to be jittered. Possible values are "point" or "p"; "label" or "l"; "both" or "b".
  • width: degree of jitter in x direction
  • height: degree of jitter in y direction
outlier.color, outlier.shape
the color and the shape of outliers. Outliers can be detected only in DBSCAN clustering.

Value

return a ggpplot.

See Also

fviz_silhouette, hcut, hkmeans, eclust, fviz_dend

Examples

Run this code
set.seed(123)

# Data preparation
# +++++++++++++++
data("iris")
head(iris)
# Remove species column (5) and scale the data
iris.scaled <- scale(iris[, -5])

# K-means clustering
# +++++++++++++++++++++
km.res <- kmeans(iris.scaled, 3, nstart = 25)

# Visualize kmeans clustering
# use repel = TRUE to avoid overplotting
fviz_cluster(km.res, iris[, -5], frame.type = "norm")


# Change the color and theme
fviz_cluster(km.res, iris[, -5]) + 
 scale_color_brewer(palette = "Set2")+
 scale_fill_brewer(palette = "Set2") +
 theme_minimal()
 
 ## Not run: 
# # Show points only
# fviz_cluster(km.res, iris[, -5], geom = "point")
# # Show text only
# fviz_cluster(km.res, iris[, -5], geom = "text")
#  
# # PAM clustering
# # ++++++++++++++++++++
# require(cluster)
# pam.res <- pam(iris.scaled, 3)
#  # Visualize pam clustering
# fviz_cluster(pam.res, geom = "point", frame.type = "norm")
# ## End(Not run)

# Hierarchical clustering
# ++++++++++++++++++++++++
# Use hcut() which compute hclust and cut the tree
hc.cut <- hcut(iris.scaled, k = 3, hc_method = "complete")
# Visualize dendrogram
fviz_dend(hc.cut, show_labels = FALSE, rect = TRUE)
# Visualize cluster
fviz_cluster(hc.cut, frame.type = "convex")



Run the code above in your browser using DataLab