plotLoadings: Plot of Loading vectors

Description

This function provides a horizontal bar plot to visualise loading vectors. For discriminant analysis, it provides visualisation of highest or lowest mean/median value of the variables with color code corresponding to the outcome of interest.

Usage

# S3 method for pls
plotLoadings(object, block, comp = 1, col = NULL, ndisplay = NULL,
size.name = 0.7, name.var = NULL, name.var.complete = FALSE, title = NULL, subtitle,
size.title = rel(2), size.subtitle = rel(1.5), layout = NULL, border = NA,
xlim = NULL, ... )
# S3 method for mint.pls
plotLoadings(object, study = "global", comp = 1, col = NULL, ndisplay = NULL,
size.name = 0.7, name.var = NULL, name.var.complete = FALSE, title = NULL, subtitle,
size.title = rel(1.8), size.subtitle = rel(1.4), layout = NULL, border = NA,
xlim = NULL, ... )
# S3 method for plsda
plotLoadings(object, contrib, method = "mean", block, comp = 1,
plot = TRUE, show.ties = TRUE, col.ties="white", ndisplay = NULL, size.name = 0.7,
size.legend = 0.8, name.var=NULL, name.var.complete=FALSE, title = NULL,
subtitle, size.title = rel(1.8), size.subtitle = rel(1.4),
legend = TRUE, legend.color = NULL, legend.title = 'Outcome',
layout = NULL, border = NA, xlim = NULL, ... )
# S3 method for mint.plsda
plotLoadings(object, contrib = NULL, method = "mean",
study = "global", comp = 1, plot = TRUE, show.ties = TRUE, col.ties = "white",
ndisplay = NULL, size.name = 0.7, size.legend = 0.8, name.var = NULL,
name.var.complete = FALSE, title = NULL, subtitle, size.title = rel(1.8),
size.subtitle = rel(1.4), legend = TRUE, legend.color = NULL,
legend.title = 'Outcome', layout = NULL, border = NA, xlim = NULL, ... )

Arguments

object

contrib

a character set to 'max' or 'min' indicating if the color of the bar should correspond to the group with the maximal or minimal expression levels / abundance.

method

a character set to 'mean' or 'median' indicating the criterion to assess the contribution. We recommend using median in the case of count or skewed data.

study

Indicates which study are to be plotted. A character vector containing some levels of object$study, "all.partial" to plot all studies or "global" is expected.

block

A single value indicating which block to consider in a sgccda object.

comp

integer value indicating the component of interest from the object.

col

color used in the barplot, only for object from non Discriminant analysis

plot

Boolean indicating of the plot should be output. If set to FALSE the user can extract the contribution matrix, see example. Default value is TRUE.

show.ties

Boolean. If TRUE then tie groups appear in the color set by col.ties, which will appear in the legend. Ties can happen when dealing with count data type. By default set to TRUE.

col.ties

Color corresponding to ties, only used if show.ties=TRUE and ties are present.

ndisplay

integer indicating how many of the most important variables are to be plotted (ranked by decreasing weights in each PLS-component). Useful to lighten a graph.

size.name

A numerical value giving the amount by which plotting the variable name text should be magnified or reduced relative to the default.

size.legend

A numerical value giving the amount by which plotting the legend text should be magnified or reduced relative to the default.

name.var

A character vector indicating the names of the variables. The names of the vector should match the names of the input data, see example.

name.var.complete

Boolean. If name.var is supplied with some empty names, name.var.complete allows you to use the initial variable names to complete the graph (from colnames(X)). Defaut to FALSE.

title

A set of characters to indicate the title of the plot. Default value is NULL.

subtitle

subtitle for each plot, only used when several block or study are plotted.

size.title

size of the title

size.subtitle

size of the subtitle

legend

Boolean indicating if the legend indicating the group outcomes should be added to the plot. Default value is TRUE.

legend.color

A color vector of length the number of group outcomes. See examples.

legend.title

A set of characters to indicate the title of the legend. Default value is NULL.

layout

Vector of two values (rows,cols) that indicates the layout of the plot. If layout is provided, the remaining empty subplots are still active

border

Argument from barplot: indicates whether to draw a border on the barplot.

xlim

Argument from barplot: limit of the x-axis. When plotting several block, a matrix is expected where each row is the xlim used for each of the blocks.

…

not used.

Details

The contribution of each variable for each component (depending on the object) is represented in a barplot where each bar length corresponds to the loading weight (importance) of the feature. The loading weight can be positive or negative.

For discriminant analysis, the color corresponds to the group in which the feature is most 'abundant'. Note that this type of graphical output is particularly insightful for count microbial data - in that latter case using the method = 'median' is advised. Note also that if the parameter contrib is not provided, plots are white.

For MINT analysis, study="global" plots the global loadings while partial loadings are plotted when study is a level of object$study. Since variable selection in MINT is performed at the global level, only the selected variables are plotted for the partial loadings even if the partial loadings are not sparse. See references. Importantly for multi plots, the legend accounts for one subplot in the layout design.

References

Rohart F. et al (2016, submitted). MINT: A multivariate integrative approach to identify a reproducible biomarker signature across multiple experiments and platforms.

Eslami, A., Qannari, E. M., Kohler, A., and Bougeard, S. (2013). Multi-group PLS Regression: Application to Epidemiology. In New Perspectives in Partial Least Squares and Related Methods, pages 243-255. Springer.

Singh A., Gautier B., Shannon C., Vacher M., Rohart F., Tebbutt S. and Le Cao K.A. (2016). DIABLO - multi omics integration for biomarker discovery.

Le Cao, K.-A., Martin, P.G.P., Robert-Granie, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinformatics 10:34.

Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris: Editions Technic.

Wold H. (1966). Estimation of principal components and related models by iterative least squares. In: Krishnaiah, P. R. (editors), Multivariate Analysis. Academic Press, N.Y., 391-420.

Examples

Run this code

# NOT RUN {
## object of class 'spls'
# --------------------------
data(liver.toxicity)
X = liver.toxicity$gene
Y = liver.toxicity$clinic

toxicity.spls = spls(X, Y, ncomp = 2, keepX = c(50, 50),
keepY = c(10, 10))

plotLoadings(toxicity.spls)

# with xlim
xlim = matrix(c(-0.1,0.3, -0.4,0.6), nrow = 2, byrow = TRUE)
plotLoadings(toxicity.spls, xlim = xlim)


## object of class 'splsda'
# --------------------------
data(liver.toxicity)
X = as.matrix(liver.toxicity$gene)
Y = as.factor(liver.toxicity$treatment[, 4])

splsda.liver = splsda(X, Y, ncomp = 2, keepX = c(20, 20))

# contribution on comp 1, based on the median. 
# Colors indicate the group in which the median expression is maximal
plotLoadings(splsda.liver, comp = 1, method = 'median')
plotLoadings(splsda.liver, comp = 1, method = 'median', contrib = "max")

# contribution on comp 2, based on median. 
#Colors indicate the group in which the median expression is maximal
plotLoadings(splsda.liver, comp = 2, method = 'median', contrib = "max")

# contribution on comp 2, based on median. 
# Colors indicate the group in which the median expression is minimal
plotLoadings(splsda.liver, comp = 2, method = 'median', contrib = 'min')

# changing the name to gene names
# if the user input a name.var but names(name.var) is NULL,
# then a warning will be output and assign names of name.var to colnames(X)
# this is to make sure we can match the name of the selected variables to the contribution plot.
name.var = liver.toxicity$gene.ID[, 'geneBank']
length(name.var)
plotLoadings(splsda.liver, comp = 2, method = 'median', name.var = name.var,
title = "Liver data", contrib = "max")

# if names are provided: ok, even when NAs
name.var = liver.toxicity$gene.ID[, 'geneBank']
names(name.var) = rownames(liver.toxicity$gene.ID)
plotLoadings(splsda.liver, comp = 2, method = 'median',
name.var = name.var, size.name = 0.5, contrib = "max")

#missing names of some genes? complete with the original names
plotLoadings(splsda.liver, comp = 2, method = 'median',
name.var = name.var, size.name = 0.5,complete.name.var=TRUE, contrib = "max")

# look at the contribution (median) for each variable
plot.contrib = plotLoadings(splsda.liver, comp = 2, method = 'median', plot = FALSE,
contrib = "max")
head(plot.contrib$contrib)
# change the title of the legend and title name
plotLoadings(splsda.liver, comp = 2, method = 'median', legend.title = 'Time',
title = 'Contribution plot', contrib = "max")

# no legend
plotLoadings(splsda.liver, comp = 2, method = 'median', legend = FALSE, contrib = "max")

# change the color of the legend
plotLoadings(splsda.liver, comp = 2, method = 'median', legend.color = c(1:4), contrib = "max")



# object 'splsda multilevel'
# -----------------
# }
# NOT RUN {
data(vac18)
X = vac18$genes
Y = vac18$stimulation
# sample indicates the repeated measurements
sample = vac18$sample
stimul = vac18$stimulation

# multilevel sPLS-DA model
res.1level = splsda(X, Y = stimul, ncomp = 3, multilevel = sample,
keepX = c(30, 137, 123))


name.var = vac18$tab.prob.gene[, 'Gene']
names(name.var) = colnames(X)

plotLoadings(res.1level, comp = 2, method = 'median', legend.title = 'Stimu',
name.var = name.var, size.name = 0.2, contrib = "max")

# too many transcripts? only output the top ones
plotLoadings(res.1level, comp = 2, method = 'median', legend.title = 'Stimu',
name.var = name.var, size.name = 0.5, ndisplay = 60, contrib = "max")

# }
# NOT RUN {
# object 'plsda'
# ----------------
# }
# NOT RUN {
# breast tumors
# ---
data(breast.tumors)
X = breast.tumors$gene.exp
Y = breast.tumors$sample$treatment

plsda.breast = plsda(X, Y, ncomp = 2)

name.var = as.character(breast.tumors$genes$name)
names(name.var) = colnames(X)

# with gene IDs, showing the top 60
plotLoadings(plsda.breast, contrib = 'max', comp = 1, method = 'median', 
            ndisplay = 60, 
            name.var = name.var,
            size.name = 0.6,
            legend.color = color.mixo(1:2))
# }
# NOT RUN {
# liver toxicity
# ---
# }
# NOT RUN {
data(liver.toxicity)
X = liver.toxicity$gene
Y = liver.toxicity$treatment[, 4]

plsda.liver = plsda(X, Y, ncomp = 2)
plotIndiv(plsda.liver, ind.names = Y, ellipse = TRUE)


name.var = liver.toxicity$gene.ID[, 'geneBank']
names(name.var) = rownames(liver.toxicity$gene.ID)

plotLoadings(plsda.liver, contrib = 'max', comp = 1, method = 'median', ndisplay = 100, 
            name.var = name.var, size.name = 0.4,
            legend.color = color.mixo(1:4))
# }
# NOT RUN {
# object 'sgccda'
# ----------------
# }
# NOT RUN {
data(nutrimouse)
Y = nutrimouse$diet
data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid)
design = matrix(c(0,1,1,1,0,1,1,1,0), ncol = 3, nrow = 3, byrow = TRUE)

nutrimouse.sgccda = wrapper.sgccda(X = data,
Y = Y,
design = design,
keepX = list(gene = c(10,10), lipid = c(15,15)),
ncomp = 2,
scheme = "centroid")

plotLoadings(nutrimouse.sgccda,block=2)
plotLoadings(nutrimouse.sgccda,block="gene")
# }
# NOT RUN {

# object 'mint.splsda'
# ----------------
data(stemcells)
data = stemcells$gene
type.id = stemcells$celltype
exp = stemcells$study

res = mint.splsda(X = data, Y = type.id, ncomp = 3, keepX = c(10,5,15), study = exp)

plotLoadings(res)
plotLoadings(res, contrib = "max")
plotLoadings(res, contrib = "min", study = 1:4,comp=2)

# combining different plots by setting a layout of 2 rows and 4columns.
# Note that the legend accounts for a subplot so 4columns instead of 2.
plotLoadings(res,contrib="min",study=c(1,2,3),comp=2, layout = c(2,4))
plotLoadings(res,contrib="min",study="global",comp=2)


# }

Run the code above in your browser using DataLab