plotVar: Plot of Variables

Description

This function provides variable representation through Correlation Circle Plots for sparse)(I)PCA, (regularized)CCA, (sparse)PLS(DA) and (sparse)(R)GCCA(DA).

Usage

plotVar(object,
           comp = c(1, 2),
           comp.select = NULL,
           var.names = NULL,
           blocks = NULL, 
           X.label = NULL,
           Y.label = NULL,
           abline.line = TRUE,
           col,
           cex,
           pch,
           font,
           cutoff = 0,
           rad.in = 0.5,
           main="Correlation Circle Plots",
           style="ggplot2", 
           overlap = TRUE,
           ...)

Arguments

object

object of class inheriting from "rcc", "pls", "plsda", "spls", "splsda", "pca" or "spca".

comp

integer vector of length two. The components that will be used on the horizontal and the vertical axis respectively to project the variables.

comp.select

for the sparse versions, an input vector indicating the components on which the variables were selected. Only those selected variables are displayed. See examples with sgcca

var.names

a list of character vectors indicating alternative names to be displayed for all variables in all data sets. Each vector should be the length of the the total number of variables in each data set. By default set to NULL to display the origin

blocks

for an object of class "rgcca" or "sgcca", a numerical vector indicating the blocks of variables to display.

X.label

x axis titles, by default set to 'Component'

Y.label

y axis titles, by default set to 'Component'

abline.line

should the vertical and horizontal line through the center be plotted? Default set to TRUE

col

character or integer vector of colors for plotted character and symbols, can be of length the number of data sets that are integrated, or a list of vectors associated to each data set, where each vector has for length the number of variables in that data

cex

numeric vector of character expansion sizes for the plotted character and symbols, can be of length the number of data sets that are integrated, or a list of vectors associated to each data set, where each vector has for length the number of variables in

pch

numeric vector for symbols, can be of length the number of data sets that are integrated, or a list of vectors associated to each data set, where each vector has for length the number of variables in that data set. See

font

numeric vector for font, can be of length the number of data sets that are integrated, or a list of vectors associated to each data set, where each vector has for length the number of variables in that data set.

cutoff

numeric between 0 and 1. Variables with correlations below this cutoff in absolute value are not plotted on any of the components specified in comp.

rad.in

numeric between 0 and 1, the radius of the inner circle. Defaults to 0.5.

main

character indicating the title plot.

style

argument to be set to either 'graphics', 'lattice' or 'ggplot2' for a style of plotting.

overlap

boolean indicating whether the correlation circle plots should be idsplayed in an overlap fashion from each data set. By default set to TRUE.

...

not used currently.

Value

You can store the function in an object which will return a matrix containing the $x$- and $y$- coordinates of the plotted variables, along with the various grpahical parameters used to generate the plot.

encoding

latin1

Details

plotVar produce a "correlation circle", i.e. the correlations between each variable and the selected components are plotted as scatter plot, with concentric circles of radius one et radius given by rad.in. Each point corresponds to a variable. For (regularized) CCA the components correspond to the equiangular vector between $X$- and $Y$-variates. For (sparse) PLS regression mode the components correspond to the $X$-variates. If mode is canonical, the components for $X$ and $Y$ variables correspond to the $X$- and $Y$-variates respectively.

For plsda and splsda objects, only the $X$ variables are represented.

For spls and splsda objects, only the $X$ and $Y$ variables selected on dimensions comp are represented.

The arguments col, pch, cex and font can be either vectors of length two or a list with two vector components of length $p$ and $q$ respectively, where $p$ is the number of $X$-variables and $q$ is the number of $Y$-variables. In the first case, the first and second component of the vector determine the graphics attributes for the $X$- and $Y$-variables respectively. Otherwise, multiple arguments values can be specified so that each point (variable) can be given its own graphic attributes. In this case, the first component of the list correspond to the $X$ attributs and the second component correspond to the $Y$ attributs. Default values exist for this arguments.

References

Gonzalez I., Le Cao K-A., Davis, M.J. and Dejean, S. (2012). Visualising associations between paired 'omics data sets. J. Data Mining 5:19. http://www.biodatamining.org/content/5/1/19/abstract

Examples

Run this code

## variable representation for objects of class 'rcc'
# ----------------------------------------------------
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
nutri.res <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)

plotVar(nutri.res) #(default)

# playing with the style
plotVar(nutri.res, style = 'lattice') #(default)

# changing x and y labels
plotVar(nutri.res, comp = c(1,3), cutoff = 0.5, 
        X.label = 'PC1', Y.label = 'PC3')

# one correlation circle plot per data set
plotVar(nutri.res, comp = c(1,2), cutoff = 0.5, 
        overlap = FALSE)


# with pch symbols
plotVar(nutri.res, comp = c(1,2), pch = c(16,2))


## variable representation for objects of class 'pls' or 'spls'
# ----------------------------------------------------
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
toxicity.spls <- spls(X, Y, ncomp = 3, keepX = c(50, 50, 50), 
                      keepY = c(10, 10, 10))

# default shows only the variables selected on the plotted components
plotVar(toxicity.spls)

# shows only the variables selected on the plotted components
plotVar(toxicity.spls, comp = c(1,3))

# shows only the variables selected on the selected components
plotVar(toxicity.spls, comp.select = c(1:3))


# change variable names
new.names = list(paste('gene', 1:ncol(X)), paste('clinic', 1:ncol(Y)))
plotVar(toxicity.spls, overlap = FALSE, var.names = new.names)

# prefilter even further and use of pch
plotVar(toxicity.spls, comp.select = c(1:3), cutoff = 0.8, pch = c(15,16))

# change colors
plotVar(toxicity.spls, col = color.mixo(3:4))

my.col = list(c(rep(1, ncol(X))), c(rep(3,ncol(Y))))
plotVar(toxicity.spls, col = my.col)


## variable representation for objects of class 'splsda'
# ----------------------------------------------------
data(liver.toxicity)
  X <- liver.toxicity$gene
  Y <- as.factor(liver.toxicity$treatment[, 4])
  
  ncomp <- 2
  keepX <- rep(20, ncomp)
  
  splsda.liver <- splsda(X, Y, ncomp = ncomp, keepX = keepX)
  # use of pch symbols
  plotVar(splsda.liver, pch = 16, col = 3)

## variable representation for objects of class 'sgcca' 
# ----------------------------------------------------

## see example in ??wrapper.sgcca
data(nutrimouse)
# need to unmap the Y factor diet
Y = unmap(nutrimouse$diet)
# set up the data as list
data = list(nutrimouse$gene, nutrimouse$lipid,Y)

# set up the design matrix:
# with this design, gene expression and lipids are connected to the diet factor
# design = matrix(c(0,0,1,
#                   0,0,1,
#                   1,1,0), ncol = 3, nrow = 3, byrow = TRUE)

# with this design, gene expression and lipids are connected to the diet factor
# and gene expression and lipids are also connected
design = matrix(c(0,1,1,
                  1,0,1,
                  1,1,0), ncol = 3, nrow = 3, byrow = TRUE)


#note: the penalty parameters will need to be tuned
wrap.result.sgcca = wrapper.sgcca(blocks = data, design = design, penalty = c(.3,.3, 1),
                                  ncomp = c(2, 2, 1),
                                  scheme = "centroid", verbose = FALSE)
#wrap.result.sgcca


# showing 2 blocks, with variables selected on comp 1 for block 1 and comp 1 for block 2
plotVar(wrap.result.sgcca, comp = c(1,2), 
        blocks = c(1,2), comp.select = c(1,1), 
        overlap = FALSE,
        main = 'Variables selected on component 1 only')


# displaying variables selected on comp 2 for block 1 and comp 2 for block 2
plotVar(wrap.result.sgcca, comp = c(1,2), blocks = c(1,2), comp.select = c(2,2), 
        main = 'Variables selected on component 2 only')


## variable representation for objects of class 'rgcca'
# ----------------------------------------------------
data(nutrimouse)
# need to unmap Y for an unsupervised analysis, where Y is included as a data block in data
Y = unmap(nutrimouse$diet)

data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid, Y = Y)
# with this design, all blocks are connected
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, 
                byrow = TRUE, dimnames = list(names(data), names(data)))

nutrimouse.rgcca <- wrapper.rgcca(blocks = data,
                                  design = design,
                                  tau = "optimal",
                                  ncomp = c(2, 2, 1),
                                  scheme = "centroid",
                                  verbose = FALSE)

# changing cex
plotVar(nutrimouse.rgcca, comp = c(1,2), blocks = c(1,2), cex = c(1.5, 1.5))
# changing font
plotVar(nutrimouse.rgcca, comp = c(1,2), blocks = c(1,2), font = c(1,3))


# set up the data as list
data = list(nutrimouse$gene, nutrimouse$lipid,Y)
# with this design, gene expression and lipids are connected to the diet factor
# design = matrix(c(0,0,1,
#                   0,0,1,
#                   1,1,0), ncol = 3, nrow = 3, byrow = TRUE)

# with this design, gene expression and lipids are connected to the diet factor
# and gene expression and lipids are also connected
design = matrix(c(0,1,1,
                  1,0,1,
                  1,1,0), ncol = 3, nrow = 3, byrow = TRUE)
#note: the tau parameter is the regularization parameter
wrap.result.rgcca = wrapper.rgcca(blocks = data, design = design, tau = c(1, 1, 0),
                                  ncomp = c(2, 2, 1),
                                  scheme = "centroid", verbose = FALSE)
#wrap.result.rgcca
plotVar(wrap.result.rgcca, comp = c(1,2), blocks = c(1,2))

Run the code above in your browser using DataLab