This function can plot different representations of the gene expression in a list of gene sets.
plotMultipleGS(
genesets_list,
ncolumns = 1,
labels = NULL,
expr,
gmt,
Subject_ID,
TimePoint,
baseline = NULL,
group.var = NULL,
Group_ID_paired = NULL,
ref = NULL,
group_of_interest = NULL,
FUNcluster = NULL,
clustering_metric = "euclidian",
clustering_method = "ward",
B = 500,
max_trends = 4,
aggreg.fun = "median",
na.rm.aggreg = TRUE,
trend.fun = "median",
methodOptiClust = "firstSEmax",
indiv = "genes",
verbose = TRUE,
clustering = TRUE,
showTrend = TRUE,
smooth = TRUE,
time_unit = "",
y.lab = NULL,
desc = TRUE,
lab.cex = 1,
axis.cex = 1,
main.cex = 1,
y.lab.angle = 90,
x.axis.angle = 45,
margins = 1,
line.size = 1,
y.lim = NULL,
x.lim = NULL,
gg.add = list(),
show_plot = TRUE
)
a list of the character strings giving the names of the
gene sets to be plotted as they appear in gmt
.
the number of columns used to display the multiple plots.
Default is 1
.
List of labels to be added to the plots.
You can also set labels="AUTO"
to auto-generate upper-case labels (such as A
, B
, ...)
or labels="auto"
to auto-generate lower-case labels.
Default is NULL
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are \(n\)
x \(p\), with the \(p\) sample in column and the \(n\) genes in row.
In the case of a list, its length should correspond to the number of gene
sets under scrutiny and each element should be an 3 dimension array of
estimated gene expression, such as for the list returned in the
'Estimations'
element of TcGSA.LR
. See details.
a gmt object containing the gene sets definition. See
GSA.read.gmt
and
definition on www.software.broadinstitute.org.
a factor of length \(p\) that is in the same order as the
columns of expr
(when it is a dataframe) and that contains the patient
identifier of each sample.
a numeric vector or a factor of length \(p\) that is in
the same order as Subject_ID
and the columns of expr
(when it is
a dataframe), and that contains the time points at which gene expression was
measured.
a character string which is the value of TimePoint
that can be used as a baseline. Default is NULL
, in which case no
time point is used as a baseline value for gene expression. Has to be
NULL
when comparing two treatment groups.
in the case of several treatment groups, this is a factor of
length \(p\) that is in the same order as Timepoint
,
Subject_ID
and the columns of expr
. It indicates to which
treatment group each sample belongs to. Default is NULL
, which means
that there is only one treatment group.
a character vector of length \(p\) that is in the
same order as Timepoint
, Subject_ID
, group.var
and the
columns of expr
. This argument must not be NULL
in the case of
a paired analysis, and must be NULL
otherwise. Default is
NULL
.
the group which is used as reference in the case of several
treatment groups. Default is NULL
, which means that reference is the
first group in alphabetical order of the labels of group.var
. See
Details.
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is NULL
,
which means that group of interest is the second group in alphabetical order
of the labels of group.var
.
a function which accepts as first argument a matrix
x
and as second argument the number of clusters desired k
, and
which returns a list with a component named 'cluster'
which is a
vector of length n = nrow(x)
of integers in 1:k, determining the clustering
or grouping of the n observations. Default is NULL
, in which case a
hierarchical clustering is performed via the function
agnes
, using the metric clustering_metric
and the method clustering_method
. See 'FUNcluster'
in
clusGap
and Details.
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when FUNcluster
is NULL
. The currently available
options are "euclidean"
and "manhattan"
. Default is
"euclidean"
. See agnes
. Also, a "sts"
option
is available in TcGSA. It implements the 'Short Time Series' distance
[Moller-Levet et al., Fuzzy Clustering of short time series and unevenly distributed
sampling points, Advances in Intelligent Data Analysis V:330-340 Springer, 2003]
designed specifically for clustering time series.
character string defining the agglomerative method
to be used in the hierarchical clustering when FUNcluster
is
NULL
. The six methods implemented are "average"
([unweighted
pair-]group average method, UPGMA), "single"
(single linkage),
"complete"
(complete linkage), "ward"
(Ward's method),
"weighted"
(weighted average linkage). Default is "ward"
. See
agnes
.
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is 500
. See
clusGap
.
integer specifying the maximum number of different clusters
to be tested. Default is 4
.
a character string such as "median"
or "mean"
or the name of any other defined statistics function that returns a single
numeric value. It specifies the function used to aggregate the observations
before the clustering. Default is to "median"
.
a logical flag indicating whether NA
should be remove to prevent
propagation through aggreg.fun
. Can be useful to set to TRUE with
unbalanced design as those will generate structural NA
s in
$Estimations
. Default is TRUE
.
a character string such as "mean"
or
the name of any other function that returns a single numeric value. It
specifies the function used to calculate the trends of the identified
clustered. Default is to "mean"
.
character string indicating how the "optimal" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are "globalmax"
, "firstmax"
,
"Tibs2001SEmax"
, "firstSEmax"
and "globalSEmax"
.
Default is "firstSEmax"
. See 'method'
in
clusGap
, Details and Tibshirani et al.,
2001 in References.
a character string indicating by which unit observations are
aggregated (through aggreg.fun
) before the clustering. Possible
values are "genes"
or "patients"
. Default is "genes"
.
See Details.
logical flag enabling verbose messages to track the computing
status of the function. Default is TRUE
.
logical flag. If FALSE
, there is no clustering
representation; if TRUE
, the lines are colored according to which
cluster they belong to. Default is TRUE
. See Details.
logical flag. If TRUE
, a black line is added for
each cluster, representing the corresponding trend.fun
. Default is
TRUE
.
logical flag. If TRUE
and showTrend
is also
TRUE
, the representation of each cluster trend.fun
is smoothed
using cubic polynomials (see geom_smooth
.
Default is TRUE
.
At the moment, must accept parameter "na.rm"
(which is automatically set to TRUE
).
This might change in future versions
the time unit to be displayed (such as "Y"
,
"M"
, "W"
, "D"
, "H"
, etc) next to the values of
TimePoint
on the x-axis. Default is ""
, in which case the time
scale on the x-axis is proportional to the time values.
character specifying the annotation of the y axis. If NULL
, an
annotation is automatically generated, if ""
, no annotation appears. Default is
NULL
.
a logical flag. If TRUE
, a line is added to the title of
the plot with the description of the gene set plotted (from the gmt file).
Default is TRUE
.
a numerical value giving the amount by which lab labels text
should be magnified relative to the default 1
.
a numerical value giving the amount by which axis annotation
text should be magnified relative to the default 1
.
a numerical value giving the amount by which title text
should be magnified relative to the default 1
.
a numerical value (in [0, 360]) giving the orientation by
which y-label text should be turned (anti-clockwise). Default is 90
.
See element_text
.
a numerical value (in [0, 360]) giving the orientation by
which x-axis annotation text should be turned (anti-clockwise). Default is
45
.
a numerical value giving the amount by which the margins
should be reduced or increased relative to the default 1
.
a numerical value giving the amount by which the line sizes
should be reduced or increased relative to the default 1
.
a numeric vector of length 2 giving the range of the y-axis.
See plot.default
.
if numeric, will create a continuous scale, if factor or
character, will create a discrete scale. Observations not in this range will
be dropped. See xlim
.
A list of instructions to add to the ggplot2
instructions.
See +.gg
. Default is list(theme())
, which adds nothing
to the plot.
logical flag. If FALSE
, no plot is drawn. Default is TRUE
.
A list with 2 elements:
classif
: a data.frame
with the 2 following variables: ProbeID
which
contains the IDs of the probes of the plotted gene set, and Cluster
containing $
which cluster the probe belongs to. If clustering
is FALSE
, then Cluster
is NA
for all the probes.
p
: a ggplot
object containing the plot
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If indiv
is 'genes', then each line of the plot is the median of a
gene expression over the patients. On the other hand, if indiv
is
'patients', then each line of the plot is the median of a patient genes
expression in this gene set.
This function uses the Gap statistics to determine the optimal number of
clusters in the plotted gene set. See
clusGap
.