This function can plot different representations of the gene expression in a list of gene sets.
plotMultipleGS(
genesets_list,
ncolumns = 1,
labels = NULL,
expr,
gmt,
Subject_ID,
TimePoint,
baseline = NULL,
group.var = NULL,
Group_ID_paired = NULL,
ref = NULL,
group_of_interest = NULL,
FUNcluster = NULL,
clustering_metric = "euclidian",
clustering_method = "ward",
B = 500,
max_trends = 4,
aggreg.fun = "median",
na.rm.aggreg = TRUE,
trend.fun = "median",
methodOptiClust = "firstSEmax",
indiv = "genes",
verbose = TRUE,
clustering = TRUE,
showTrend = TRUE,
smooth = TRUE,
time_unit = "",
y.lab = NULL,
desc = TRUE,
lab.cex = 1,
axis.cex = 1,
main.cex = 1,
y.lab.angle = 90,
x.axis.angle = 45,
margins = 1,
line.size = 1,
y.lim = NULL,
x.lim = NULL,
gg.add = list(),
show_plot = TRUE
)A list with 2 elements:
classif: a data.frame with the 2 following variables: ProbeID which
contains the IDs of the probes of the plotted gene set, and Cluster containing $
which cluster the probe belongs to. If clustering is FALSE, then Cluster is NA for all the probes.
p: a ggplot object containing the plot
a list of the character strings giving the names of the
gene sets to be plotted as they appear in gmt.
the number of columns used to display the multiple plots.
Default is 1.
List of labels to be added to the plots.
You can also set labels="AUTO" to auto-generate upper-case labels (such as A, B, ...)
or labels="auto" to auto-generate lower-case labels.
Default is NULL
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are \(n\)
x \(p\), with the \(p\) sample in column and the \(n\) genes in row.
In the case of a list, its length should correspond to the number of gene
sets under scrutiny and each element should be an 3 dimension array of
estimated gene expression, such as for the list returned in the
'Estimations' element of TcGSA.LR. See details.
a gmt object containing the gene sets definition. See
GSA.read.gmt and
definition on www.software.broadinstitute.org.
a factor of length \(p\) that is in the same order as the
columns of expr (when it is a dataframe) and that contains the patient
identifier of each sample.
a numeric vector or a factor of length \(p\) that is in
the same order as Subject_ID and the columns of expr (when it is
a dataframe), and that contains the time points at which gene expression was
measured.
a character string which is the value of TimePoint
that can be used as a baseline. Default is NULL, in which case no
time point is used as a baseline value for gene expression. Has to be
NULL when comparing two treatment groups.
in the case of several treatment groups, this is a factor of
length \(p\) that is in the same order as Timepoint,
Subject_ID and the columns of expr. It indicates to which
treatment group each sample belongs to. Default is NULL, which means
that there is only one treatment group.
a character vector of length \(p\) that is in the
same order as Timepoint, Subject_ID, group.var and the
columns of expr. This argument must not be NULL in the case of
a paired analysis, and must be NULL otherwise. Default is
NULL.
the group which is used as reference in the case of several
treatment groups. Default is NULL, which means that reference is the
first group in alphabetical order of the labels of group.var. See
Details.
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is NULL,
which means that group of interest is the second group in alphabetical order
of the labels of group.var.
a function which accepts as first argument a matrix
x and as second argument the number of clusters desired k, and
which returns a list with a component named 'cluster' which is a
vector of length n = nrow(x) of integers in 1:k, determining the clustering
or grouping of the n observations. Default is NULL, in which case a
hierarchical clustering is performed via the function
agnes, using the metric clustering_metric
and the method clustering_method. See 'FUNcluster' in
clusGap and Details.
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when FUNcluster is NULL. The currently available
options are "euclidean" and "manhattan". Default is
"euclidean". See agnes. Also, a "sts" option
is available in TcGSA. It implements the 'Short Time Series' distance
[Moller-Levet et al., Fuzzy Clustering of short time series and unevenly distributed
sampling points, Advances in Intelligent Data Analysis V:330-340 Springer, 2003]
designed specifically for clustering time series.
character string defining the agglomerative method
to be used in the hierarchical clustering when FUNcluster is
NULL. The six methods implemented are "average" ([unweighted
pair-]group average method, UPGMA), "single" (single linkage),
"complete" (complete linkage), "ward" (Ward's method),
"weighted" (weighted average linkage). Default is "ward". See
agnes.
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is 500. See
clusGap.
integer specifying the maximum number of different clusters
to be tested. Default is 4.
a character string such as "median" or "mean"
or the name of any other defined statistics function that returns a single
numeric value. It specifies the function used to aggregate the observations
before the clustering. Default is to "median".
a logical flag indicating whether NA should be remove to prevent
propagation through aggreg.fun. Can be useful to set to TRUE with
unbalanced design as those will generate structural NAs in
$Estimations. Default is TRUE.
a character string such as "mean" or
the name of any other function that returns a single numeric value. It
specifies the function used to calculate the trends of the identified
clustered. Default is to "mean".
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are "globalmax", "firstmax",
"Tibs2001SEmax", "firstSEmax" and "globalSEmax".
Default is "firstSEmax". See 'method' in
clusGap, Details and Tibshirani et al.,
2001 in References.
a character string indicating by which unit observations are
aggregated (through aggreg.fun) before the clustering. Possible
values are "genes" or "patients". Default is "genes".
See Details.
logical flag enabling verbose messages to track the computing
status of the function. Default is TRUE.
logical flag. If FALSE, there is no clustering
representation; if TRUE, the lines are colored according to which
cluster they belong to. Default is TRUE. See Details.
logical flag. If TRUE, a black line is added for
each cluster, representing the corresponding trend.fun. Default is
TRUE.
logical flag. If TRUE and showTrend is also
TRUE, the representation of each cluster trend.fun is smoothed
using cubic polynomials (see geom_smooth.
Default is TRUE.
At the moment, must accept parameter "na.rm" (which is automatically set to TRUE).
This might change in future versions
the time unit to be displayed (such as "Y",
"M", "W", "D", "H", etc) next to the values of
TimePoint on the x-axis. Default is "", in which case the time
scale on the x-axis is proportional to the time values.
character specifying the annotation of the y axis. If NULL, an
annotation is automatically generated, if "", no annotation appears. Default is
NULL.
a logical flag. If TRUE, a line is added to the title of
the plot with the description of the gene set plotted (from the gmt file).
Default is TRUE.
a numerical value giving the amount by which lab labels text
should be magnified relative to the default 1.
a numerical value giving the amount by which axis annotation
text should be magnified relative to the default 1.
a numerical value giving the amount by which title text
should be magnified relative to the default 1.
a numerical value (in [0, 360]) giving the orientation by
which y-label text should be turned (anti-clockwise). Default is 90.
See element_text.
a numerical value (in [0, 360]) giving the orientation by
which x-axis annotation text should be turned (anti-clockwise). Default is
45.
a numerical value giving the amount by which the margins
should be reduced or increased relative to the default 1.
a numerical value giving the amount by which the line sizes
should be reduced or increased relative to the default 1.
a numeric vector of length 2 giving the range of the y-axis.
See plot.default.
if numeric, will create a continuous scale, if factor or
character, will create a discrete scale. Observations not in this range will
be dropped. See xlim.
A list of instructions to add to the ggplot2 instructions.
See +.gg. Default is list(theme()), which adds nothing
to the plot.
logical flag. If FALSE, no plot is drawn. Default is TRUE.
Boris P. Hejblum
If expr is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr is a list returned in the
'Estimations' element of TcGSA.LR, then it is those
"estimations" made by the TcGSA.LR function that are plotted.
If indiv is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap.