Learn R Programming

factoextra (version 1.0.3)

fviz_ca: Visualize Correspondence Analysis

Description

Correspondence analysis (CA) is an extension of Principal Component Analysis (PCA) suited to analyze frequencies formed by two categorical variables. fviz_ca() provides ggplot2-based elegant visualization of CA outputs from the R functions: CA [in FactoMineR], ii) ca [in ca], coa [in ade4] and correspondence [in MASS]. Read more: Correspondence Analysis
  • fviz_ca_row(): Graph of row variables
  • fviz_ca_col(): Graph of column variables
  • fviz_ca_biplot(): Biplot of row and column variables
  • fviz_ca(): An alias of fviz_ca_biplot()

Usage

fviz_ca_row(X, axes = c(1, 2), shape.row = 19, geom = c("point", "text"), label = "all", invisible = "none", labelsize = 4, pointsize = 2, col.row = "blue", col.row.sup = "darkblue", alpha.row = 1, select.row = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", repel = FALSE, jitter = list(what = "label", width = NULL, height = NULL), ...)
fviz_ca_col(X, axes = c(1, 2), shape.col = 17, geom = c("point", "text"), label = "all", invisible = "none", labelsize = 4, pointsize = 2, col.col = "red", col.col.sup = "darkred", alpha.col = 1, select.col = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", repel = FALSE, jitter = list(what = "label", width = NULL, height = NULL), ...)
fviz_ca_biplot(X, axes = c(1, 2), shape.row = 19, shape.col = 17, geom = c("point", "text"), label = "all", invisible = "none", labelsize = 4, pointsize = 2, col.col = "red", col.col.sup = "darkred", alpha.col = 1, col.row = "blue", col.row.sup = "darkblue", alpha.row = 1, select.col = list(name = NULL, cos2 = NULL, contrib = NULL), select.row = list(name = NULL, cos2 = NULL, contrib = NULL), map = "symmetric", arrows = c(FALSE, FALSE), repel = FALSE, title = "CA factor map - Biplot", jitter = list(what = "label", width = NULL, height = NULL), ...)
fviz_ca(X, ...)

Arguments

X
an object of class CA [FactoMineR], ca [ca], coa [ade4]; correspondence [MASS].
axes
a numeric vector of length 2 specifying the dimensions to be plotted.
shape.row, shape.col
the point shapes to be used for row/column variables. Default values are 19 for rows and 17 for columns.
geom
a character specifying the geometry to be used for the graph. Allowed values are the combination of c("point", "arrow", "text"). Use "point" (to show only points); "text" to show only labels; c("point", "text") or c("arrow", "text") to show both types.
label
a character vector specifying the elements to be labelled. Default value is "all". Allowed values are "none" or the combination of c("row", "row.sup", "col", "col.sup"). Use "col" to label only active column variables; "col.sup" to label only supplementary columns; etc
invisible
a character value specifying the elements to be hidden on the plot. Default value is "none". Allowed values are the combination of c("row", "row.sup","col", "col.sup").
labelsize
font size for the labels
pointsize
the size of points
map
character string specifying the map type. Allowed options include: "symmetric", "rowprincipal", "colprincipal", "symbiplot", "rowgab", "colgab", "rowgreen" and "colgreen". See details
repel
a boolean, whether to use ggrepel to avoid overplotting text labels or not.
jitter
a parameter used to jitter the points in order to reduce overplotting. It's a list containing the objects what, width and height (i.e jitter = list(what, width, height)).
  • what: the element to be jittered. Possible values are "point" or "p"; "label" or "l"; "both" or "b".
  • width: degree of jitter in x direction
  • height: degree of jitter in y direction
...
optional arguments.
col.col, col.row
color for column/row points. The default values are "red" and "blue", respectively. Allowed values include also : "cos2", "contrib", "coord", "x" or "y". In this case, the colors for row/column variables are automatically controlled by their qualities ("cos2"), contributions ("contrib"), coordinates (x^2 + y^2, "coord"), x values("x") or y values("y")
col.col.sup, col.row.sup
colors for the supplementary column and row points, respectively.
alpha.col, alpha.row
controls the transparency of colors. The value can variate from 0 (total transparency) to 1 (no transparency). Default value is 1. Allowed values include also : "cos2", "contrib", "coord", "x" or "y" as for the arguments col.col and col.row.
select.col, select.row
a selection of columns/rows to be drawn. Allowed values are NULL or a list containing the arguments name, cos2 or contrib:
  • name is a character vector containing column/row names to be drawn
  • cos2 if cos2 is in [0, 1], ex: 0.6, then columns/rows with a cos2 > 0.6 are drawn. if cos2 > 1, ex: 5, then the top 5 columns/rows with the highest cos2 are drawn.
  • contrib if contrib > 1, ex: 5, then the top 5 columns/rows with the highest contrib are drawn
arrows
Vector of two logicals specifying if the plot should contain points (FALSE, default) or arrows (TRUE). First value sets the rows and the second value sets the columns.
title
the title of the graph

Value

a ggplot2 plot

Details

The default plot of CA is a "symmetric" plot in which both rows and columns are in principal coordinates. In this situation, it's not possible to interpret the distance between row points and column points. To overcome this problem, the simplest way is to make an asymmetric plot. This means that, the column profiles must be presented in row space or vice-versa. The allowed options for the argument map are:
  • "rowprincipal" or "colprincipal": asymmetric plots with either rows in principal coordinates and columns in standard coordinates, or vice versa. These plots preserve row metric or column metric respectively.
  • "symbiplot": Both rows and columns are scaled to have variances equal to the singular values (square roots of eigenvalues), which gives a symmetric biplot but does not preserve row or column metrics.
  • "rowgab" or "colgab": Asymmetric maps, proposed by Gabriel & Odoroff (1990), with rows (respectively, columns) in principal coordinates and columns (respectively, rows) in standard coordinates multiplied by the mass of the corresponding point.
  • "rowgreen" or "colgreen": The so-called contribution biplots showing visually the most contributing points (Greenacre 2006b). These are similar to "rowgab" and "colgab" except that the points in standard coordinates are multiplied by the square root of the corresponding masses, giving reconstructions of the standardized residuals.

References

http://www.sthda.com

See Also

get_ca, fviz_pca, fviz_mca

Examples

Run this code
# Correspondence Analysis
# ++++++++++++++++++++++++++++++
# Install and load FactoMineR to compute CA
# install.packages("FactoMineR")

library("FactoMineR")
data(housetasks)
head(housetasks)
res.ca <- CA(housetasks, graph=FALSE)

# Graph of row variables
# +++++++++++++++++++++

# Default plot
fviz_ca_row(res.ca)

# Customize the plot
 # - Show text only: geom = "text" (to show point only: geom = "point")
 # - Change color: col.row ="#00AFBB"
 # - Change title and axis labels
 # - Change axis limits by specifying the min and max
 # - Change themes: http://www.sthda.com/english/wiki/ggplot2-themes

fviz_ca_row(res.ca, geom = "text", col.row = "#00AFBB") +
   labs(title = "CA", x = "Dim.1", y ="Dim.2" )+ # titles
   xlim(-1.3, 1.7) + ylim (-1.5, 1)+ # axis limits
   theme_minimal() # change theme
   
   
# Control automatically the color of row points
   # using the "cos2" or the contributions "contrib"
   # cos2 = the quality of the rows on the factor map

fviz_ca_row(res.ca, col.row = "cos2")+
theme_minimal()

# Change gradient color
# Use repel = TRUE to avoid overplotting (slow if many points)
fviz_ca_row(res.ca, col.row = "cos2", repel = TRUE) + 
      scale_color_gradient2(low = "white", mid = "#2E9FDF", 
      high = "#FC4E07", midpoint = 0.5, space = "Lab")+
theme_minimal()
  
# Color by the contributions   
fviz_ca_row(res.ca, col.row = "contrib") + 
      scale_color_gradient2(low = "white", mid = "#00AFBB", 
      high="#E7B800", midpoint = 10, space = "Lab")+
theme_minimal()
      
# You can also control the transparency 
# of the color by the "cos2" or "contrib"
fviz_ca_row(res.ca, alpha.row="contrib") +
     theme_minimal()        
      
# Select and visualize some rows with select.row argument.
 # - Rows with cos2 >= 0.5: select.row = list(cos2 = 0.5)
 # - Top 7 rows according to the cos2: select.row = list(cos2 = 7)
 # - Top 7 contributing rows: select.row = list(contrib = 7)
 # - Select rows by names: select.row = list(name = c("Breakfeast", "Repairs", "Holidays"))
 
 # Example: Select the top 7 contributing rows
fviz_ca_row(res.ca, select.row = list(contrib = 7))

 
# Graph of column points
# ++++++++++++++++++++++++++++

# Default plot
fviz_ca_col(res.ca, col.col = "red")+
theme_minimal()

 
# Control colors using their contributions
fviz_ca_col(res.ca, col.col = "contrib")+
 scale_color_gradient2(low = "white", mid = "blue", 
           high = "red", midpoint = 25, space = "Lab") +
 theme_minimal()    
       
# Select columns with select.col argument
   # You can select by contrib, cos2 and name 
   # as previously described for ind
# Select the top 3 contributing columns
fviz_ca_col(res.ca, select.col = list(contrib = 3))
    
    
# Biplot of rows and columns
# ++++++++++++++++++++++++++
# Symetric Biplot of rows and columns
fviz_ca_biplot(res.ca)

# Asymetric biplot, use arrows for columns
fviz_ca_biplot(res.ca, map ="rowprincipal",
 arrow = c(FALSE, TRUE))
 
# Keep only the labels for row points
fviz_ca_biplot(res.ca, label ="row")

# Keep only labels for column points
fviz_ca_biplot(res.ca, label ="col")


# You can hide row or column points using 
# invisible = "row" or invisible = "col", respectively
fviz_ca_biplot(res.ca, invisible ="row")

# Control automatically the color of rows using the cos2
fviz_ca_biplot(res.ca, col.row="cos2") +
       theme_minimal()
       
# Select the top 7 contributing rows
# And the top 3 columns
fviz_ca_biplot(res.ca,  
               select.row = list(contrib = 7),
               select.col = list(contrib = 3))+
 theme_minimal() 
 

Run the code above in your browser using DataLab