simulateMultiExpr: Simulate multi-set expression data

Description

Simulation of expression data in several sets with relate module structure.

Usage

simulateMultiExpr(eigengenes, 
                  nGenes, 
                  modProportions, 
                  minCor = 0.5, maxCor = 1, 
                  corPower = 1, 
                  backgroundNoise = 0.1, 
                  leaveOut = NULL, 
                  signed = FALSE, 
                  propNegativeCor = 0.3, 
                  geneMeans = NULL,
                  nSubmoduleLayers = 0, 
                  nScatteredModuleLayers = 0, 
                  averageNGenesInSubmodule = 10, 
                  averageExprInSubmodule = 0.2, 
                  submoduleSpacing = 2, 
                  verbose = 1, indent = 0)

Arguments

eigengenes

the seed eigengenes for the simulated modules in a multi-set format. A list with one component per set. Each component is again a list that must contain a component data. This is a data frame of seed eigengenes for the corresponding data set. Columns correspond to modules, rows to samples. Number of samples in the simulated data is determined from the number of samples of the eigengenes.

nGenes

integer specifyin the number of simulated genes.

modProportions

a numeric vector with length equal the number of eigengenes in eigengenes plus one, containing fractions of the total number of genes to be put into each of the modules and into the "grey module", which means genes not related to any of the modules. See details.

minCor

minimum correlation of module genes with the corresponding eigengene. See details.

maxCor

maximum correlation of module genes with the corresponding eigengene. See details.

corPower

controls the dropoff of gene-eigengene correlation. See details.

backgroundNoise

amount of background noise to be added to the simulated expression data.

leaveOut

optional specification of modules that should be left out of the simulation, that is their genes will be simulated as unrelated ("grey"). A logical matrix in which columns correspond to sets and rows to modules. Wherever TRUE, the corresponding module in the corresponding data set will not be simulated, that is its genes will be simulated independently of the eigengene.

signed

logical: should the genes be simulated as belonging to a signed network? If TRUE, all genes will be simulated to have positive correlation with the eigengene. If FALSE, a proportion given by propNegativeCor will be simulated with negative correlations of the same absolute values.

propNegativeCor

proportion of genes to be simulated with negative gene-eigengene correlations. Only effective if signed is FALSE.

geneMeans

optional vector of length nGenes giving desired mean expression for each gene. If not given, the returned expression profiles will have mean zero.

nSubmoduleLayers

number of layers of ordered submodules to be added. See details.

nScatteredModuleLayers

number of layers of scattered submodules to be added. See details.

averageNGenesInSubmodule

average number of genes in a submodule. See details.

averageExprInSubmodule

average strength of submodule expression vectors.

submoduleSpacing

a number giving submodule spacing: this multiple of the submodule size will lie between the submodule and the next one.

verbose

integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.

indent

indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Value

A list with the following components:

multiExpr

simulated expression data in multi-set format analogous to that of the input eigengenes. A list with one component per set. Each component is again a list that must contains a component data. This is a data frame of expression data for the corresponding data set. Columns correspond to genes, rows to samples.

setLabels

a matrix of dimensions (number of genes) times (number of sets) that contains module labels for each genes in each simulated data set.

allLabels

a matrix of dimensions (number of genes) times (number of sets) that contains the module labels that would be simulated if no module were left out using leaveOut. This means that all columns of the matrix are equal; the columns are repeated for convenience so allLabels has the same dimensions as setLabels.

labelOrder

a matrix of dimensions (number of modules) times (number of sets) that contains the order in which module labels were assigned to genes in each set. The first label is assigned to genes 1...(module size of module labeled by first label), the second label to the following batch of genes etc.

Details

For details of simulation of individual data sets and the meaning of individual set simulation arguments, see simulateDatExpr. This function simulates several data sets at a time and puts the result in a multi-set format. The number of genes is the same for all data sets. Module memberships are also the same, but modules can optionally be ``dissolved'', that is their genes will be simulated as unassigned. Such ``dissolved'', or left out, modules can be specified in the matrix leaveOut.

References

A short description of the simulation method can also be found in the Supplementary Material to the article

Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54.

The material is posted at http://horvath.genetics.ucla.edu/html/CoexpressionNetwork/EigengeneNetwork/SupplementSimulations.pdf.