This function plots a gene sets dynamic trends heatmap.
# S3 method for TcGSA
plot(
x,
threshold = 0.05,
myproc = "BY",
nbsimu_pval = 1e+06,
expr,
Subject_ID,
TimePoint,
baseline = NULL,
only.signif = TRUE,
group.var = NULL,
Group_ID_paired = NULL,
ref = NULL,
group_of_interest = NULL,
ranking = FALSE,
FUNcluster = NULL,
clustering_metric = "euclidian",
clustering_method = "ward",
B = 500,
max_trends = 4,
aggreg.fun = "median",
na.rm.aggreg = TRUE,
methodOptiClust = "firstSEmax",
indiv = "genes",
verbose = TRUE,
clust_trends = NULL,
N_clusters = NULL,
myclusters = NULL,
label.clusters = NULL,
prev_rowCL = NULL,
descript = TRUE,
plot = TRUE,
color.vec = c("darkred", "#D73027", "#FC8D59", "snow", "#91BFDB", "#4575B4",
"darkblue"),
legend.breaks = NULL,
label.column = NULL,
time_unit = "",
cex.label.row = 1,
cex.label.column = 1,
margins = c(5, 25),
heatKey.size = 1,
dendrogram.size = 1,
heatmap.height = 1,
heatmap.width = 1,
cex.clusterKey = 1,
cex.main = 1,
horiz.clusterKey = TRUE,
main = NULL,
subtitle = NULL,
...
)
an object of class'TcGSA
'.
the threshold at which the FDR or the FWER should be controlled.
a vector of character strings containing the names of the
multiple testing procedures for which adjusted p-values are to be computed.
This vector should include any of the following: "Bonferroni
",
"Holm
", "Hochberg
", "SidakSS
", "SidakSD
",
"BH
", "BY
", "ABH
", "TSBH
" or "none
".
"none
" indicates no adjustment for multiple testing. See
mt.rawp2adjp
for details. Default is
"BY
", the Benjamini & Yekutieli (2001) step-up FDR-controlling
procedure (general dependency structures). In order to control the FWER(in
case of an analysis that is more a hypothesis confirmation than an
exploration of the expression data), we recommend to use "Holm
", the
Holm (1979) step-down adjusted p-values for strong control of the FWER.
the number of observations under the null distribution to
be generated in order to compute the p-values. Default is 1e+06
.
either a matrix or dataframe of gene expression upon which
dynamics are to be calculated, or a list of gene sets estimation of gene
expression. In the case of a matrix or dataframe, its dimension are \(n\)
x \(p\), with the \(p\) sample in column and the \(n\) genes in row.
In the case of a list, its length should correspond to the number of gene
sets under scrutiny and each element should be an 3 dimension array of
estimated gene expression, such as for the list returned in the
'Estimations'
element of TcGSA.LR
. See details.
a factor of length \(p\) that is in the same order as the
columns of expr
(when it is a dataframe) and that contains the patient
identifier of each sample. Ignored if expr
is a list of estimations.
a numeric vector or a factor of length \(p\) that is in
the same order as Subject_ID
and the columns of expr
(when it
is a dataframe), and that contains the time points at which gene expression
was measured. Ignored if expr
is a list of estimations.
the value of TimePoint
to be used as baseline. Default is NULL
in which case expression is centered and no baseline is used.
logical flag for plotting only the significant gene sets.
If FALSE
, all the gene sets from the gmt object contained in
x
are plotted. Default is TRUE
.
in the case of several treatment` groups, this is a factor of
length \(p\) that is in the same order as Timepoint
,
Subject_ID
, sample_name
and the columns of expr
. It
indicates to which treatment group each sample belongs to. Default is
NULL
, which means that there is only one treatment group. See
Details.
a character vector of length \(p\) that is in the
same order as Timepoint
, Subject_ID
, sample_name
,
group.var
and the columns of expr
. This argument must not be
NULL
in the case of a paired analysis, and must be NULL
otherwise. Default is NULL
. See Details.
the group which is used as reference in the case of several
treatment groups. Default is NULL
, which means that reference is the
first group in alphabetical order of the labels of group.var
. See
Details.
the group of interest, for which dynamics are to be
computed in the case of several treatment groups. Default is NULL
,
which means that group of interest is the second group in alphabetical order
of the labels of group.var
. See Details.
group_of_interest
here~~
a logical flag. If TRUE
, the gene set trends are not hierarchically classified, but
ordered by decreasing Likelihood ratios. Default is FALSE
.
character string specifying the metric to be used
for calculating dissimilarities between observations in the hierarchical
clustering when FUNcluster
is NULL
. The currently available
options are "euclidean"
and "manhattan"
. Default is
"euclidean"
. See agnes
. Also, a "sts"
option
is available in TcGSA. It implements the 'Short Time Series' distance
[Moller-Levet et al., Fuzzy Clustering of short time series and unevenly distributed
sampling points, Advances in Intelligent Data Analysis V:330-340 Springer, 2003]
designed specifically for clustering time series.
character string defining the agglomerative method
to be used in the hierarchical clustering when FUNcluster
is
NULL
. The six methods implemented are "average"
([unweighted
pair-]group average method, UPGMA), "single"
(single linkage),
"complete"
(complete linkage), "ward"
(Ward's method),
"weighted"
(weighted average linkage). Default is "ward"
. See
agnes
.
integer specifying the number of Monte Carlo ("bootstrap") samples
used to compute the gap statistics. Default is 500
. See
clusGap
.
integer specifying the maximum number of different clusters
to be tested. Default is 4
.
a character string such as "mean"
, "median"
or the name of any other statistics function defined that returns a single
numeric value. It specifies the function used to aggregate the observations
before the clustering. Default is to median
. Default is
"median"
.
a logical flag indicating whether NA
should be remove to prevent
propagation through aggreg.fun
. Can be useful to set to TRUE with
unbalanced design as those will generate structural NA
s in
$Estimations
. Default is TRUE
.
character string indicating how the "optimal"" number
of clusters is computed from the gap statistics and their standard
deviations. Possible values are "globalmax"
, "firstmax"
,
"Tibs2001SEmax"
, "firstSEmax"
and "globalSEmax"
.
Default is "firstSEmax"
. See 'method'
in
clusGap
, Details and Tibshirani et al.,
2001 in References.
a character string indicating by which unit observations are
aggregated (through aggreg.fun
) before the clustering. Possible
values are "genes"
or "patients"
. Default is "genes"
.
logical flag enabling verbose messages to track the computing
status of the function. Default is TRUE
.
object of class ClusteredTrends containing
already computed trends for the plotted gene sets. Default is NULL
.
an integer that is the number of clusters in which the
dynamics should be regrouped. The cutoff of the clustering tree is
automatically calculated accordingly. Default is NULL
, in which case
the dendrogram is not cut and no clusters are identified.
a character vector of colors for predefined clusters of the
represented gene sets, with as many levels as the value of N_clusters
.
Default is NULL
, in which case the clusters are automatically
identified and colored via the cutree
function and the
N_clusters
argument only.
if N_clusters
is not NULL
, a character
vector of length N_clusterss
. Default is NULL
, in which case
if N_clusters
is not NULL
, clusters are simply labeled with
numbers.
a hclust object, such as the one return by the
present plotting function (see Value) for instance. If not NULL
, no
clustering is calculated by the present plotting function and this tree is
used to represent the gene sets dynamics. Default is NULL
.
logical flag indicating that the description of the gene sets
should appear after their name on the right side of the plot if TRUE
.
Default is TRUE
. See Details.
logical flag indicating whether the heatmap should be plotted or
not. Default is TRUE
.
a character strings vector used to define the color
palette used in the plot. Default is
c("#D73027", "#FC8D59","lightyellow", "#91BFDB", "#4575B4")
.
a numeric vector indicating the splitting points for
coloring. Default is NULL
, in which case the break points will be
spaced equally and symmetrically about 0.
a vector of character strings with the labels to be
displayed for the columns (i.e. the time points). Default is NULL
.
the time unit to be displayed (such as "Y"
,
"M"
, "W"
, "D"
, "H"
, etc) next to the values of
TimePoint
in the columns labels when label.column
is
NULL
. Default is ""
.
a numerical value giving the amount by which row labels
text should be magnified relative to the default 1
.
a numerical value giving the amount by which column
labels text should be magnified relative to the default 1
.
numeric vector of length 2 containing the margins (see
par(mar= *))
for column and row names, respectively. Default
is c(15, 100)
. See Details.
the size of the color key for the heatmap fill. Default
is 1
.
the horizontal size of the dendrogram. Default is 1
the height of the heatmap. Default is 1
the width of the heatmap. Default is 1
a numerical value giving the amount by which the
clusters legend text should be magnified relative to the default 1
,
when N_clusters
is not NULL
.
a numerical value giving the amount by which title text
should be magnified relative to the default 1
.
a logical flag; if TRUE
, set the legend for
clusters horizontally rather than vertically. Only used if the
N_clusters
argument is not NULL
. Default is TRUE
.
a character string for an optional title. Default is NULL
.
a character string for an optional subtitle. Default is NULL
.
other parameters to be passed through to plotting functions.
An object of class hclust which describes the tree produced by the clustering process. The object is a list with components:
merge
an \(n-1\) by \(2\) matrix. Row \(i\) of
merge
describes the merging of clusters at step i of the clustering.
If an element \(j\) in the row is negative, then observation -\(j\) was
merged at this stage. If \(j\) is positive then the merge was with the
cluster formed at the (earlier) stage \(j\) of the algorithm. Thus
negative entries in merge indicate agglomerations of singletons, and positive
entries indicate agglomerations of non-singletons.
height
a set of \(n-1\) real values (non-decreasing for
ultrametric trees). The clustering height: that is, the value of the
criterion associated with the Ward clustering method.
order
a vector giving the permutation of the original
observations suitable for plotting, in the sense that a cluster plot using
this ordering and matrix merge will not have crossings of the branches.
labels
the gene set trends name.
call
the call which produced the result clustering:
hclust(d = dist(map2heat, method = "euclidean"), method = "ward.D2")
method
"ward.D2", as it is the clustering method that has been used
for clustering the gene set trends.
dist.method
"euclidean", as it is the distance that has been used
for clustering the gene set trends.
legend.breaks
a numeric vector giving the splitting points used
for coloring the heatmap. If plot
is FALSE
, then it is
NULL
.
myclusters
a character vector of colors for the dynamic clusters
of the represented gene set trends, with as many levels as the value of
N_clusters
. If no dynamic clusters were represented, than this is
NULL
.
ddr
a dendrogram object with the reordering used for the
heatmap. See heatmap.2
function from package
gplots
.
gene set.names character vector with the names of the gene sets used in the heatmap.
clust.trends
a ClusteredTrends object.
clustersExport
a data frame with 2 variables containing the two
following variables :
GeneSet
: the gene set trends
clustered.
Cluster
: the dynamic cluster they belong to.
The
data frame is order by the variable Cluster
.
data_plotted
: the data matrix represented by the heatmap
On the heatmap, each line corresponds to a gene set, and each column to a time point.
If expr
is a matrix or a dataframe, then the "original" data are
plotted. On the other hand, if expr
is a list returned in the
'Estimations'
element of TcGSA.LR
, then it is those
"estimations" made by the TcGSA.LR
function that are plotted.
If descript
is FALSE
, the second element of margins
can
be reduced (for instance use margins = c(5, 10)
), as there is not so
much need for space in order to display only the gene set names, without
their description.
If there is a large number of significant gene sets, the hierarchical clustering
step repeated for each of them can take a few minutes. To speed things up
(especially) when playing with the plotting parameters for having a nice plot,
one can run the clustTrend
function beforehand, and plug its results
in the plot.TcGSA
function via the clust_trends
argument.
Hejblum BP, Skinner J, Thiebaut R, (2015) Time-Course Gene Set Analysis for Longitudinal Gene Expression Data. PLOS Comput. Biol. 11(6): e1004310. doi: 10.1371/journal.pcbi.1004310
# NOT RUN {
if(interactive()){
data(data_simu_TcGSA)
tcgsa_sim_1grp <- TcGSA.LR(expr=expr_1grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE)
summary(tcgsa_sim_1grp)
plot(x=tcgsa_sim_1grp, expr=tcgsa_sim_1grp$Estimations,
Subject_ID=design$Patient_ID, TimePoint=design$TimePoint,
baseline=1,
B=100,
time_unit="H",
dendrogram.size=0.4, heatmap.width=0.8, heatmap.height=2, cex.main=0.7
)
tcgsa_sim_2grp <- TcGSA.LR(expr=expr_2grp, gmt=gmt_sim, design=design,
subject_name="Patient_ID", time_name="TimePoint",
time_func="linear", crossedRandom=FALSE,
group_name="group.var")
summary(tcgsa_sim_2grp)
plot(x=tcgsa_sim_2grp, expr=expr_2grp,
Subject_ID=design$Patient_ID, TimePoint=design$TimePoint,
B=100,
time_unit="H",
)
}
# }
Run the code above in your browser using DataLab