simulateDatExpr: Simulation of expression data

Description

Simulation of expression data with a customizable modular structure and several different types of noise.

Usage

simulateDatExpr(
  eigengenes, 
  nGenes, 
  modProportions, 
  minCor = 0.3, 
  maxCor = 1, 
  corPower = 1, 
  signed = FALSE, 
  propNegativeCor = 0.3, 
  geneMeans = NULL,
  backgroundNoise = 0.1, 
  leaveOut = NULL, 
  nSubmoduleLayers = 0, 
  nScatteredModuleLayers = 0, 
  averageNGenesInSubmodule = 10, 
  averageExprInSubmodule = 0.2, 
  submoduleSpacing = 2, 
  verbose = 1, indent = 0)

Value

A list with the following components:

datExpr: simulated expression data in a data frame whose columns correspond genes and rows to samples.
setLabels: simulated module assignment. Module labels are numeric, starting from 1. Genes simulated to be outside of proper modules have label 0. Modules that are left out (specified in leaveOut) are indicated as 0 here.
allLabels: simulated module assignment. Genes that belong to leftout modules (specified in leaveOut) are indicated by their would-be assignment here.
labelOrder: a vector specifying the order in which labels correspond to the given eigengenes, that is labelOrder[1] is the label assigned to module whose seed is eigengenes[, 1] etc.

Arguments

eigengenes: a data frame containing the seed eigengenes for the simulated modules. Rows correspond to samples and columns to modules.
nGenes: total number of genes to be simulated.
modProportions: a numeric vector with length equal the number of eigengenes in eigengenes plus one, containing fractions of the total number of genes to be put into each of the modules and into the "grey module", which means genes not related to any of the modules. See details.
minCor: minimum correlation of module genes with the corresponding eigengene. See details.
maxCor: maximum correlation of module genes with the corresponding eigengene. See details.
corPower: controls the dropoff of gene-eigengene correlation. See details.
signed: logical: should the genes be simulated as belonging to a signed network? If TRUE, all genes will be simulated to have positive correlation with the eigengene. If FALSE, a proportion given by propNegativeCor will be simulated with negative correlations of the same absolute values.
propNegativeCor: proportion of genes to be simulated with negative gene-eigengene correlations. Only effective if signed is FALSE.
geneMeans: optional vector of length nGenes giving desired mean expression for each gene. If not given, the returned expression profiles will have mean zero.
backgroundNoise: amount of background noise to be added to the simulated expression data.
leaveOut: optional specification of modules that should be left out of the simulation, that is their genes will be simulated as unrelated ("grey"). This can be useful when simulating several sets, in some which a module is present while in others it is absent.
nSubmoduleLayers: number of layers of ordered submodules to be added. See details.
nScatteredModuleLayers: number of layers of scattered submodules to be added. See details.
averageNGenesInSubmodule: average number of genes in a submodule. See details.
averageExprInSubmodule: average strength of submodule expression vectors.
submoduleSpacing: a number giving submodule spacing: this multiple of the submodule size will lie between the submodule and the next one.
verbose: integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
indent: indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Author

Peter Langfelder

Details

Given eigengenes can be unrelated or they can exhibit non-trivial correlations. Each module is simulated separately from others. The expression profiles are chosen such that their correlations with the eigengene run from just below maxCor to minCor (hence minCor must be between 0 and 1, not including the bounds). The parameter corPower can be chosen to control the behaviour of the simulated correlation with the gene index; values higher than 1 will result in the correlation approaching minCor faster and lower than 1 slower.

Numbers of genes in each module are specified (as fractions of the total number of genes nGenes) by modProportions. The last entry in modProportions corresponds to the genes that will be simulated as unrelated to anything else ("grey" genes). The proportion must add up to 1 or less. If the sum is less than one, the remaining genes will be partitioned into groups and simulated to be "close" to the proper modules, that is with small but non-zero correlations (between minCor and 0) with the module eigengene.

If signed is set FALSE, the correlation for some of the module genes is chosen negative (but the absolute values remain the same as they would be for positively correlated genes). To ensure consistency for simulations of multiple sets, the indices of the negatively correlated genes are fixed and distributed evenly.

In addition to the primary module structure, a secondary structure can be optionally simulated. Modules in the secondary structure have sizes chosen from an exponential distribution with mean equal averageNGenesInSubmodule. Expression vectors simulated in the secondary structure are simulated with expected standard deviation chosen from an exponential distribution with mean equal averageExprInSubmodule; the higher this coefficient, the more pronounced will the submodules be in the main modules. The secondary structure can be simulated in several layers; their number is given by SubmoduleLayers. Genes in these submodules are ordered in the same order as in the main modules.

In addition to the ordered submodule structure, a scattered submodule structure can be simulated as well. This structure can be viewed as noise that tends to correlate random groups of genes. The size and effect parameters are the same as for the ordered submodules, and the number of layers added is controlled by nScatteredModuleLayers.

References

A short description of the simulation method can also be found in the Supplementary Material to the article

Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between co-expression modules. BMC Systems Biology 2007, 1:54.

The material is posted at http://horvath.genetics.ucla.edu/html/CoexpressionNetwork/EigengeneNetwork/SupplementSimulations.pdf.