Simulation of expression data in several sets with relate module structure.
simulateMultiExpr(eigengenes,
nGenes,
modProportions,
minCor = 0.5, maxCor = 1,
corPower = 1,
backgroundNoise = 0.1,
leaveOut = NULL,
signed = FALSE,
propNegativeCor = 0.3,
geneMeans = NULL,
nSubmoduleLayers = 0,
nScatteredModuleLayers = 0,
averageNGenesInSubmodule = 10,
averageExprInSubmodule = 0.2,
submoduleSpacing = 2,
verbose = 1, indent = 0)
A list with the following components:
simulated expression data in multi-set format analogous to that of the input
eigengenes
. A list with one
component per set. Each component is again a list that must contains a component data
. This is a data
frame of expression data for the corresponding data set. Columns correspond to genes, rows to samples.
a matrix of dimensions (number of genes) times (number of sets) that contains module labels for each genes in each simulated data set.
a matrix of dimensions (number of genes) times (number of sets) that contains the module
labels that would be simulated if no module were left out using leaveOut
. This means that all columns
of the matrix are equal; the columns are repeated for convenience so allLabels
has the same
dimensions as setLabels
.
a matrix of dimensions (number of modules) times (number of sets) that contains the order in which module labels were assigned to genes in each set. The first label is assigned to genes 1...(module size of module labeled by first label), the second label to the following batch of genes etc.
the seed eigengenes for the simulated modules in a multi-set format. A list with one
component per set. Each component is again a list that must contain a component data
. This is a data
frame of seed eigengenes for the corresponding data set. Columns correspond to modules, rows to samples.
Number of samples in the simulated data is determined from the number of samples of the eigengenes.
integer specifyin the number of simulated genes.
a numeric vector with length equal the number of eigengenes in eigengenes
plus one, containing fractions of the total number of genes to be put into each of the modules and into
the "grey module", which means genes not related to any of the modules. See details.
minimum correlation of module genes with the corresponding eigengene. See details.
maximum correlation of module genes with the corresponding eigengene. See details.
controls the dropoff of gene-eigengene correlation. See details.
amount of background noise to be added to the simulated expression data.
optional specification of modules that should be left out of the simulation, that is
their genes will be simulated as unrelated ("grey"). A logical matrix in which columns correspond to sets
and rows to modules. Wherever TRUE
, the corresponding module in the corresponding data set will not
be simulated, that is its genes will be simulated independently of the eigengene.
logical: should the genes be simulated as belonging to a signed network? If TRUE
,
all genes will be simulated to have positive correlation with the eigengene. If FALSE
, a
proportion given by propNegativeCor
will be simulated with negative correlations of the same
absolute values.
proportion of genes to be simulated with negative gene-eigengene correlations.
Only effective if signed
is FALSE
.
optional vector of length nGenes
giving desired mean expression for each gene. If
not given, the returned expression profiles will have mean zero.
number of layers of ordered submodules to be added. See details.
number of layers of scattered submodules to be added. See details.
average number of genes in a submodule. See details.
average strength of submodule expression vectors.
a number giving submodule spacing: this multiple of the submodule size will lie between the submodule and the next one.
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
Peter Langfelder
For details of simulation of individual data sets and the meaning of individual set simulation arguments,
see simulateDatExpr
. This function
simulates several data sets at a time and puts the result in a multi-set format. The number of genes is the
same for all data sets. Module memberships are also the same, but modules can optionally be ``dissolved'',
that is their genes will be simulated as unassigned. Such ``dissolved'', or left out, modules can be
specified in the matrix leaveOut
.
A short description of the simulation method can also be found in the Supplementary Material to the article
Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54.
The material is posted at http://horvath.genetics.ucla.edu/html/CoexpressionNetwork/EigengeneNetwork/SupplementSimulations.pdf.
simulateEigengeneNetwork
for a simulation of eigengenes with a given causal structure;
simulateDatExpr
for simulation of individual data sets;
simulateDatExpr5Modules
for a simple simulation of a data set consisting of 5 modules;
simulateModule
for simulations of individual modules;