Learn R Programming

factoextra (version 1.0.3)

fviz_pca: Visualize Principal Component Analysis

Description

Principal component analysis (PCA) reduces the dimensionality of multivariate data, to two or three that can be visualized graphically with minimal loss of information. fviz_pca() provides ggplot2-based elegant visualization of PCA outputs from: i) prcomp and princomp [in built-in R stats], ii) PCA [in FactoMineR] and iii) dudi.pca [in ade4]. Read more: Principal Component Analysis
  • fviz_pca_ind(): Graph of individuals
  • fviz_pca_var(): Graph of variables
  • fviz_pca_biplot(): Biplot of individuals and variables
  • fviz_pca(): An alias of fviz_pca_biplot()

Usage

fviz_pca(X, ...)
fviz_pca_ind(X, axes = c(1, 2), geom = c("point", "text"), repel = FALSE, label = "all", invisible = "none", labelsize = 4, pointsize = 2, habillage = "none", addEllipses = FALSE, ellipse.level = 0.95, ellipse.type = "norm", ellipse.alpha = 0.1, col.ind = "black", col.ind.sup = "blue", alpha.ind = 1, select.ind = list(name = NULL, cos2  = NULL, contrib = NULL), jitter = list(what = "label", width = NULL, height  = NULL), title = "Individuals factor map - PCA", axes.linetype = "dashed", ...)
fviz_pca_var(X, axes = c(1, 2), geom = c("arrow", "text"), label = "all", invisible = "none", repel = FALSE, labelsize = 4, col.var = "black", alpha.var = 1, col.quanti.sup = "blue", col.circle = "grey70", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), jitter = list(what = "label", width = NULL, height = NULL), title = "Variables factor map - PCA", axes.linetype = "dashed")
fviz_pca_biplot(X, axes = c(1, 2), geom = c("point", "text"), label = "all", invisible = "none", labelsize = 4, pointsize = 2, habillage = "none", addEllipses = FALSE, ellipse.level = 0.95, col.ind = "black", col.ind.sup = "blue", alpha.ind = 1, col.var = "steelblue", alpha.var = 1, col.quanti.sup = "blue", col.circle = "grey70", repel = FALSE, axes.linetype = "dashed", select.var = list(name = NULL, cos2 = NULL, contrib = NULL), select.ind = list(name = NULL, cos2 = NULL, contrib = NULL), title = "Biplot of variables and individuals", jitter = list(what = "label", width = NULL, height = NULL), ...)

Arguments

X
an object of class PCA [FactoMineR]; prcomp and princomp [stats]; dudi and pca [ade4].
...
Arguments to be passed to the function fviz_pca_biplot().
axes
a numeric vector of length 2 specifying the dimensions to be plotted.
geom
a text specifying the geometry to be used for the graph. Allowed values are the combination of c("point", "arrow", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") or c("arrow", "text") to show both types.
repel
a boolean, whether to use ggrepel to avoid overplotting text labels or not.
label
a text specifying the elements to be labelled. Default value is "all". Allowed values are "none" or the combination of c("ind", "ind.sup", "quali", "var", "quanti.sup"). "ind" can be used to label only active individuals. "ind.sup" is for supplementary individuals. "quali" is for supplementary qualitative variables. "var" is for active variables. "quanti.sup" is for quantitative supplementary variables.
invisible
a text specifying the elements to be hidden on the plot. Default value is "none". Allowed values are the combination of c("ind", "ind.sup", "quali", "var", "quanti.sup").
labelsize
font size for the labels
pointsize
the size of points
habillage
an optional factor variable for coloring the observations by groups. Default value is "none". If X is a PCA object from FactoMineR package, habillage can also specify the supplementary qualitative variable (by its index or name) to be used for coloring individuals by groups (see ?PCA in FactoMineR).
addEllipses
logical value. If TRUE, draws ellipses around the individuals when habillage != "none".
ellipse.level
the size of the concentration ellipse in normal probability.
ellipse.type
Character specifying frame type. Possible values are 'convex' or types supporeted by stat_ellipse including one of c("t", "norm", "euclid").
ellipse.alpha
Alpha for ellipse specifying the transparency level of fill color. Use alpha = 0 for no fill color.
col.ind, col.var
color for individuals and variables, respectively. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for individuals/variables are automatically controlled by their qualities of representation ("cos2"), contributions ("contrib"), coordinates (x^2+y^2, "coord"), x values ("x") or y values ("y"). To use automatic coloring (by cos2, contrib, ....), make sure that habillage ="none".
col.ind.sup
color for supplementary individuals
alpha.ind, alpha.var
controls the transparency of individual and variable colors, respectively. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Possible values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the transparency for the individual/variable colors are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2+y^2, "coord"), x values("x") or y values("y"). To use this, make sure that habillage ="none".
select.ind, select.var
a selection of individuals/variables to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
  • name: is a character vector containing individuals/variables to be drawn
  • cos2: if cos2 is in [0, 1], ex: 0.6, then individuals/variables with a cos2 > 0.6 are drawn. if cos2 > 1, ex: 5, then the top 5 individuals/variables with the highest cos2 are drawn.
  • contrib: if contrib > 1, ex: 5, then the top 5 individuals/variables with the highest contrib are drawn
jitter
a parameter used to jitter the points in order to reduce overplotting. It's a list containing the objects what, width and height (i.e jitter = list(what, width, height)).
  • what: the element to be jittered. Possible values are "point" or "p"; "label" or "l"; "both" or "b".
  • width: degree of jitter in x direction
  • height: degree of jitter in y direction
title
the title of the graph
axes.linetype
linetype of x and y axes.
col.quanti.sup
a color for the quantitative supplementary variables.
col.circle
a color for the correlation circle.

Value

a ggplot

See Also

fviz_ca, fviz_mca

Examples

Run this code

# Principal component analysis
# ++++++++++++++++++++++++++++++
data(iris)
res.pca <- prcomp(iris[, -5],  scale = TRUE)

# Graph of individuals
# +++++++++++++++++++++

# Default plot
fviz_pca_ind(res.pca, col.ind = "#00AFBB")

 
# 1. Control automatically the color of individuals 
   # using the "cos2" or the contributions "contrib"
   # cos2 = the quality of the individuals on the factor map
# 2. To keep only point or text use geom = "point" or geom = "text".
# 3. Change themes: http://www.sthda.com/english/wiki/ggplot2-themes

fviz_pca_ind(res.pca, col.ind="cos2", geom = "point")+
 theme_minimal() 

# Change gradient color
# Use repel = TRUE to avoid overplotting (slow if many points)
fviz_pca_ind(res.pca, col.ind="cos2", repel = TRUE) + 
      scale_color_gradient2(low = "white", mid = "#2E9FDF", 
      high= "#FC4E07", midpoint=0.6, space = "Lab")+
      theme_minimal()
   
# You can also control the transparency 
# of the color by the cos2
fviz_pca_ind(res.pca, alpha.ind="cos2") +
     theme_minimal()        
             
# Color individuals by groups, add concentration ellipses
# Remove labels: label = "none".
p <- fviz_pca_ind(res.pca, label="none", habillage=iris$Species,
       addEllipses=TRUE, ellipse.level=0.95)
print(p)
             
# Change group colors using RColorBrewer color palettes
# Read more: http://www.sthda.com/english/wiki/ggplot2-colors
p + scale_color_brewer(palette="Dark2") +
    scale_fill_brewer(palette="Dark2") +
     theme_minimal()
     
# Change group colors manually
# Read more: http://www.sthda.com/english/wiki/ggplot2-colors
p + scale_color_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
 scale_fill_manual(values=c("#999999", "#E69F00", "#56B4E9"))+
 theme_minimal()    
      
# Select and visualize some individuals (ind) with select.ind argument.
 # - ind with cos2 >= 0.96: select.ind = list(cos2 = 0.96)
 # - Top 20 ind according to the cos2: select.ind = list(cos2 = 20)
 # - Top 20 contributing individuals: select.ind = list(contrib = 20)
 # - Select ind by names: select.ind = list(name = c("23", "42", "119") )
 
 # Example: Select the top 40 according to the cos2
fviz_pca_ind(res.pca, select.ind = list(cos2 = 40))

 
# Graph of variables
# ++++++++++++++++++++++++++++
  
# Default plot
fviz_pca_var(res.pca, col.var = "steelblue")+
theme_minimal()
 
# Control variable colors using their contributions
fviz_pca_var(res.pca, col.var = "contrib")+
 scale_color_gradient2(low="white", mid="blue", 
           high="red", midpoint=96, space = "Lab") +
 theme_minimal()         
 
# Select variables with select.var argument
   # You can select by contrib, cos2 and name 
   # as previously described for ind
# Select the top 3 contributing variables
fviz_pca_var(res.pca, select.var = list(contrib = 3))

    
# Biplot of individuals and variables
# ++++++++++++++++++++++++++
fviz_pca_biplot(res.pca)

# Keep only the labels for variables
# Change the color by groups, add ellipses
fviz_pca_biplot(res.pca, label = "var", habillage=iris$Species,
               addEllipses=TRUE, ellipse.level=0.95)+
theme_minimal()

 
 

Run the code above in your browser using DataLab