Order nodes in descending order of weighted degree and order modules by the similarity of their summary vectors.
nodeOrder(
network,
data,
correlation,
moduleAssignments = NULL,
modules = NULL,
backgroundLabel = "0",
discovery = NULL,
test = NULL,
na.rm = FALSE,
orderModules = TRUE,
mean = FALSE,
simplify = TRUE,
verbose = TRUE
)
A nested list structure. At the top level, the list has one element per
'discovery'
dataset. Each of these elements is a list that has one
element per 'test'
dataset analysed for that 'discovery'
dataset. Each of these elements is a list that has one element per
'modules'
specified, containing a vector of node names for the
requested module. When simplify = TRUE
then the simplest possible
structure will be returned. E.g. if the node ordering are requested for
module(s) in only one dataset, then a single vector of node labels will
be returned.
When simplify = FALSE
then a nested list of datasets will always be
returned, i.e. each element at the top level and second level correspond to
a dataset, and each element at the third level will correspond to modules
discovered in the dataset specified at the top level if module labels are
provided in the corresponding moduleAssignments
list element. E.g.
results[["Dataset1"]][["Dataset2"]][["module1"]]
will contain the
order of nodes calculated in "Dataset2", where "module1" was indentified in
"Dataset1". Modules and datasets for which calculation of the node order
have not been requested will contain NULL
.
a list of interaction networks, one for each dataset. Each entry of the list should be a \(n * n\) matrix or where each element contains the edge weight between nodes \(i\) and \(j\) in the inferred network for that dataset.
a list of matrices, one for each dataset. Each entry of the list
should be the data used to infer the interaction network
for that
dataset. The columns should correspond to variables in the data
(nodes in the network) and rows to samples in that dataset.
a list of matrices, one for each dataset. Each entry of
the list should be a \(n * n\) matrix where each element contains the
correlation coefficient between nodes \(i\) and \(j\) in the
data
used to infer the interaction network for that dataset.
a list of vectors, one for each discovery dataset, containing the module assignments for each node in that dataset.
a list of vectors, one for each discovery
dataset,
of modules to perform the analysis on. If unspecified, all modules
in each discovery
dataset will be analysed, with the exception of
those specified in backgroundLabel
argument.
a single label given to nodes that do not belong to
any module in the moduleAssignments
argument. Defaults to "0". Set
to NULL
if you do not want to skip the network background module.
a vector of names or indices denoting the discovery
dataset(s) in the data
, correlation
, network
,
moduleAssignments
, modules
, and test
lists.
a list of vectors, one for each discovery
dataset,
of names or indices denoting the test dataset(s) in the data
,
correlation
, and network
lists.
logical; If TRUE
, nodes and modules present in the
discovery
dataset but missing from the test dataset are excluded. If
FALSE
, missing nodes and modules are put last in the ordering.
logical; if TRUE
modules ordered by clustering
their summary vectors. If FALSE
modules are returned in the order
provided.
logical; if TRUE
, node order will be calculated for each
discovery
dataset by averaging the weighted degree and pooling
module summary vectors across the specified test
datasets.
If FALSE
, the node order is calculated separately in each test
dataset.
logical; if TRUE
, simplify the structure of the output
list if possible (see Return Value).
logical; should progress be reported? Default is TRUE
.
The preservation of network modules in a second
dataset is quantified by measuring the preservation of topological
properties between the discovery and test datasets. These
properties are calculated not only from the interaction networks inferred
in each dataset, but also from the data used to infer those networks (e.g.
gene expression data) as well as the correlation structure between
variables/nodes. Thus, all functions in the NetRep
package have the
following arguments:
network
:
a list of interaction networks, one for each dataset.
data
:
a list of data matrices used to infer those networks, one for each
dataset.
correlation
:
a list of matrices containing the pairwise correlation coefficients
between variables/nodes in each dataset.
moduleAssignments
:
a list of vectors, one for each discovery dataset, containing
the module assignments for each node in that dataset.
modules
:
a list of vectors, one for each discovery dataset, containing
the names of the modules from that dataset to analyse.
discovery
:
a vector indicating the names or indices of the previous arguments'
lists to use as the discovery dataset(s) for the analyses.
test
:
a list of vectors, one vector for each discovery dataset,
containing the names or indices of the network
, data
, and
correlation
argument lists to use as the test dataset(s)
for the analysis of each discovery dataset.
The formatting of these arguments is not strict: each function will attempt
to make sense of the user input. For example, if there is only one
discovery
dataset, then input to the moduleAssigments
and
test
arguments may be vectors, rather than lists. If the
nodeOrder
are being calculate within the discovery or
test datasets, then the discovery
and test
arguments do
not need to be specified, and the input matrices for the network
,
data
, and correlation
arguments do not need to be wrapped in
a list.
Matrices in the network
, data
, and correlation
lists
can be supplied as disk.matrix
objects. This class allows
matrix data to be kept on disk and loaded as required by NetRep.
This dramatically decreases memory usage: the matrices for only one
dataset will be kept in RAM at any point in time.
When multiple 'test'
datasets are specified and 'mean'
is
TRUE
, then the order of nodes will be determine by the average of
each node's weighted degree across datasets. The weighted degree in each
dataset is scaled to the node with the maximum weighted degree in that
module in that dataset: this prevents differences in average edge weight
across datasets from influencing the outcome (otherwise the mean would be
weighted by the overall density of connections in the module). Thus, the
mean weighted degree is a robust measure of a node's relative importance
to a module across datasets. The mean is calculated with
'na.rm=TRUE'
: where a node is missing it does not contribute to
the mean.
Langfelder, P., Mischel, P. S. & Horvath, S. When is hub gene selection better than standard meta-analysis? PLoS One 8, e61505 (2013).
networkProperties
# load in example data, correlation, and network matrices for a discovery
# and test dataset:
data("NetRep")
# Set up input lists for each input matrix type across datasets. The list
# elements can have any names, so long as they are consistent between the
# inputs.
network_list <- list(discovery=discovery_network, test=test_network)
data_list <- list(discovery=discovery_data, test=test_data)
correlation_list <- list(discovery=discovery_correlation, test=test_correlation)
labels_list <- list(discovery=module_labels)
# Sort modules by similarity and nodes within each module by their weighted
# degree
nodes <- nodeOrder(
network=network_list, data=data_list, correlation=correlation_list,
moduleAssignments=labels_list
)
Run the code above in your browser using DataLab