individualTOMs: Calculate individual correlation network matrices

Description

This function calculates correlation network matrices (adjacencies or topological overlaps), after optionally first pre-clustering input data into blocks.

Usage

individualTOMs(
   multiExpr,
   multiWeights = NULL,
   multiExpr.imputed = NULL,  
   # Data checking options
   checkMissingData = TRUE,
   # Blocking options
   blocks = NULL,
   maxBlockSize = 5000,
   blockSizePenaltyPower = 5,
   nPreclusteringCenters = NULL,
   randomSeed = 54321,
   # Network construction options
   networkOptions,
   # Save individual TOMs? 
   saveTOMs = TRUE,
   individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
   # Behaviour options
   collectGarbage = TRUE,
   verbose = 2, indent = 0)

Value

A list with the following components:

blockwiseAdjacencies: A multiData structure containing (possibly blockwise) network matrices for each input data set. The network matrices are stored as BlockwiseData objects.
setNames: A copy of names(multiExpr).
nSets: Number of sets in multiExpr
blockInfo: A list of class BlockInformation, giving information about blocks and gene and sample filtering.
networkOptions: The input networkOptions, returned as a multiData structure with one entry per input data set.

Arguments

multiExpr: expression data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression data, with rows corresponding to samples and columns to genes or probes.
multiWeights: optional observation weights in the same format (and dimensions) as multiExpr. These weights are used for correlation calculations with data in multiExpr.
multiExpr.imputed: Optional version of multiExpr with missing data imputed. If not given and multiExpr contains missing data, they will be imputed using the function impute.knn.
checkMissingData: logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
blocks: optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of multiExpr giving the number of the block to which the corresponding gene belongs.
maxBlockSize: integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.
blockSizePenaltyPower: number specifying how strongly blocks should be penalized for exceeding the maximum size. Set to a lrge number or Inf if not exceeding maximum block size is very important.
nPreclusteringCenters: number of centers to be used in the preclustering. Defaults to smaller of nGenes/20 and 100*nGenes/maxBlockSize, where nGenes is the nunber of genes (variables) in multiExpr.
randomSeed: integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the function will not save and restore the seed.
networkOptions: A single list of class NetworkOptions giving options for network calculation for all of the networks, or a multiData structure containing one such list for each input data set.
saveTOMs: logical: should individual TOMs be saved to disk (TRUE) or retuned directly in the return value (FALSE)?
individualTOMFileNames: character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
collectGarbage: Logical: should garbage collection be called after each block calculation? This can be useful when the data are large, but could unnecessarily slow down calculation with small data.
verbose: Integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
indent: Indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Author

Peter Langfelder

Details

The function starts by optionally filtering out samples that have too many missing entries and genes that have either too many missing entries or zero variance in at least one set. Genes that are filtered out are excluded from the network calculations.

If blocks is not given and the number of genes (columns) in multiExpr exceeds maxBlockSize, genes are pre-clustered into blocks using the function consensusProjectiveKMeans; otherwise all genes are treated in a single block. Any missing data in multiExpr will be imputed; if imputed data are already available, they can be supplied separately.