plotContrib: Contribution plot of variables

Description

This function provides a horizontal bar plot to visualise the highest or lowest mean/median value of the variables with color code corresponding to the outcome of interest. Applies DA objects (sPLSDA, PLSDA, and multilevel, SGCCDA) and for count data types.

Usage

plotContrib(object,
            contrib="max",  # choose between 'max" or "min"
            method = "mean", # choose between 'mean" or "median"
            block=1, #single value
            comp = 1,
            show.ties = TRUE,
            col.ties="white",
            ndisplay = NULL,
            cex.name = 0.7,
            cex.legend = 0.8,
            name.var=NULL,
            complete.name.var=FALSE,
            legend = TRUE,
            legend.color = NULL,
            main = NULL,
            legend.title = 'Outcome',
            plot = TRUE)

Arguments

object

object of class inheriting from a plsda, splsda or sgccda class

contrib

a character set to 'max' or 'min' indicating if the color of the bar should correspond to the group with the maximal or minimal expression levels / abundance. By default set to 'max'.

method

a character set to 'mean' or 'median' indicating the criterion to assess the contribution. We recommend using median in the case of count or skewed data.

block

A single value indicating which block to consider in a sgccda object. Defaut is set to the first block.

comp

integer value indicating the component of interest from the object.

show.ties

Boolean. If TRUE then tie groups appear in the color set by col.ties, which will appear in the legend. Ties can happen when dealing with count data type. By default set to TRUE.

col.ties

Color corresponding to ties, only used if show.ties=TRUE and ties are present.

ndisplay

integer indicating how many of the most important variables are to be plotted (ranked by decreasing weights in each PLS-component). Useful to lighten a graph.

cex.name

A numerical value giving the amount by which plotting the variable name text should be magnified or reduced relative to the default.

cex.legend

A numerical value giving the amount by which plotting the legend text should be magnified or reduced relative to the default.

name.var

A character vector indicating the names of the variables. The names of the vector should match the names of the input data, see example.

complete.name.var

Boolean. If name.var is supplied with some empty names, complete.name.var allows you to use the initial variable names to complete the graph (from colnames(X)). Defaut to FALSE.

legend

Boolean indicating if the legend indicating the group outcomes should be added to the plot. Default value is TRUE.

legend.color

A color vector of length the number of group outcomes. See examples.

main

A set of characters to indicate the title of the plot. Default value is NULL.

legend.title

A set of characters to indicate the title of the legend. Default value is NULL.

plot

Boolean indicating of the plot should be output. If set to FALSE the user can extract the contribution matrix, see example. Default value is TRUE.

encoding

latin1

Details

The contribution of each variable is represented in a barplot where eachbar length corresponds to the loading weight (importance) of the feature in (sparse)PLSDA for each component. The loading weight can be positive or negative. The color corresponds to the group in which the feature is most 'abundant'. Note that this type of graphical output is particularly insightful for count microbial data - in that latter case using the method = 'median' is advised.

Examples

Run this code

## object of class 'splsda' 
# --------------------------
data(liver.toxicity)
X <- as.matrix(liver.toxicity$gene)
Y <- as.factor(liver.toxicity$treatment[, 4])

splsda.liver <- splsda(X, Y, ncomp = 2, keepX = c(20, 20))

# contribution on comp 1, based on the median. 
# Colors indicate the group in which the median expression is maximal
plotContrib(splsda.liver, comp = 1, method = 'median')

# contribution on comp 2, based on median. 
#Colors indicate the group in which the median expression is maximal
plotContrib(splsda.liver, comp = 2, method = 'median')

# contribution on comp 2, based on median. 
# Colors indicate the group in which the median expression is minimal
plotContrib(splsda.liver, comp = 2, method = 'median', contrib = 'min')

# changing the name to gene names
# if the user input a name.var but names(name.var) is NULL, 
# then a warning will be output and assign names of name.var to colnames(X)
# this is to make sure we can match the name of the selected variables to the contribution plot.
name.var = liver.toxicity$gene.ID[, 'geneBank']
length(name.var)
plotContrib(splsda.liver, comp = 2, method = 'median', name.var = name.var)

# if names are provided: ok, even when NAs
name.var = liver.toxicity$gene.ID[, 'geneBank']
names(name.var) = rownames(liver.toxicity$gene.ID)
plotContrib(splsda.liver, comp = 2, method = 'median', 
name.var = name.var, cex.name = 0.5)

#missing names of some genes? complete with the original names
plotContrib(splsda.liver, comp = 2, method = 'median',
name.var = name.var, cex.name = 0.5,complete.name.var=TRUE)

# look at the contribution (median) for each variable
plot.contrib = plotContrib(splsda.liver, comp = 2, method = 'median', plot = FALSE)
head(plot.contrib$contrib)
# change the title of the legend and title name
plotContrib(splsda.liver, comp = 2, method = 'median', legend.title = 'Time', 
main = 'Contribution plot')

# no legend
plotContrib(splsda.liver, comp = 2, method = 'median', legend = FALSE)

# change the color of the legend
plotContrib(splsda.liver, comp = 2, method = 'median', legend.color = c(1:4))



# object 'splsda multilevel'
# -----------------
data(vac18)
X <- vac18$genes
Y <- vac18$stimulation
# sample indicates the repeated measurements
design <- data.frame(sample = vac18$sample, 
                     stimul = vac18$stimulation)

# multilevel sPLS-DA model
res.1level <- multilevel(X, ncomp = 3, design = design,
                         method = "splsda", keepX = c(30, 137, 123))


name.var = vac18$tab.prob.gene[, 'Gene']
names(name.var) = colnames(X)

plotContrib(res.1level, comp = 2, method = 'median', legend.title = 'Stimu', 
name.var = name.var, cex.name = 0.2)

# too many transcripts? only output the top ones
plotContrib(res.1level, comp = 2, method = 'median', legend.title = 'Stimu', 
name.var = name.var, cex.name = 0.5, ndisplay = 60)

# object 'plsda'
# ----------------
# breast tumors
# ---
data(breast.tumors)
X <- breast.tumors$gene.exp
Y <- breast.tumors$sample$treatment

plsda.breast <- plsda(X, Y, ncomp = 2)

name.var = as.character(breast.tumors$genes$name)
names(name.var) = colnames(X)

# with gene IDs, showing the top 60
plotContrib(plsda.breast, contrib = 'max', comp = 1, method = 'median', 
            ndisplay = 60, 
            name.var = name.var,
            cex.name = 0.6,
            legend.color = color.mixo(1:2))

# liver toxicity
# ---
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$treatment[, 4]

plsda.liver <- plsda(X, Y, ncomp = 2)
plotIndiv(plsda.liver, ind.names = Y, plot.ellipse = TRUE)


name.var = liver.toxicity$gene.ID[, 'geneBank']
names(name.var) = rownames(liver.toxicity$gene.ID)

plotContrib(plsda.liver, contrib = 'max', comp = 1, method = 'median', ndisplay = 100, 
            name.var = name.var, cex.name = 0.4,
            legend.color = color.mixo(1:4))

# object 'sgccda'
# ----------------
data(nutrimouse)
Y = nutrimouse$diet
data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid)
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, byrow = TRUE)

nutrimouse.sgccda <- wrapper.sgccda(blocks = data,
Y = Y,
design = design,
keep = list(c(10,10), c(15,15)),
ncomp = c(2, 2, 1),
scheme = "centroid",
verbose = FALSE,
bias = FALSE)

plotContrib(nutrimouse.sgccda,block=2)
plotContrib(nutrimouse.sgccda,block="gene")