individualTOMs: Calculate individual correlation network matrices

Description

This function calculates correlation network matrices (adjacencies or topological overlaps), after optionally first pre-clustering input data into blocks.

Usage

individualTOMs(
   multiExpr,
   multiWeights = NULL,
   multiExpr.imputed = NULL,  
   # Data checking options
   checkMissingData = TRUE,
   # Blocking options
   blocks = NULL,
   maxBlockSize = 5000,
   blockSizePenaltyPower = 5,
   nPreclusteringCenters = NULL,
   randomSeed = 12345,
   # Network construction options
   networkOptions,
   # Save individual TOMs? 
   saveTOMs = TRUE,
   individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
   # Behaviour options
   collectGarbage = TRUE,
   verbose = 2, indent = 0)

Arguments

multiExpr

expression data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression data, with rows corresponding to samples and columns to genes or probes.

multiWeights

optional observation weights in the same format (and dimensions) as multiExpr. These weights are used for correlation calculations with data in multiExpr.

multiExpr.imputed

Optional version of multiExpr with missing data imputed. If not given and multiExpr contains missing data, they will be imputed using the function impute.knn.

checkMissingData

logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.

blocks

optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of multiExpr giving the number of the block to which the corresponding gene belongs.

maxBlockSize

integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.

blockSizePenaltyPower

number specifying how strongly blocks should be penalized for exceeding the maximum size. Set to a lrge number or Inf if not exceeding maximum block size is very important.

nPreclusteringCenters

number of centers to be used in the preclustering. Defaults to smaller of nGenes/20 and 100*nGenes/maxBlockSize, where nGenes is the nunber of genes (variables) in multiExpr.

randomSeed

integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the function will not save and restore the seed.

networkOptions

A single list of class NetworkOptions giving options for network calculation for all of the networks, or a multiData structure containing one such list for each input data set.

saveTOMs

logical: should individual TOMs be saved to disk (TRUE) or retuned directly in the return value (FALSE)?

individualTOMFileNames

character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.

collectGarbage

Logical: should garbage collection be called after each block calculation? This can be useful when the data are large, but could unnecessarily slow down calculation with small data.

verbose

Integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.

indent

Indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Value

A list with the following components:

blockwiseAdjacencies

A multiData structure containing (possibly blockwise) network matrices for each input data set. The network matrices are stored as BlockwiseData objects.

setNames

A copy of names(multiExpr).

nSets

Number of sets in multiExpr

blockInfo

A list of class BlockInformation, giving information about blocks and gene and sample filtering.

networkOptions

The input networkOptions, returned as a multiData structure with one entry per input data set.

Details

The function starts by optionally filtering out samples that have too many missing entries and genes that have either too many missing entries or zero variance in at least one set. Genes that are filtered out are excluded from the network calculations.

If blocks is not given and the number of genes (columns) in multiExpr exceeds maxBlockSize, genes are pre-clustered into blocks using the function consensusProjectiveKMeans; otherwise all genes are treated in a single block. Any missing data in multiExpr will be imputed; if imputed data are already available, they can be supplied separately.

For each block of genes, the network adjacency is constructed and (if requested) topological overlap is calculated in each set. The topological overlaps can be saved to disk as RData files, or returned directly within the return value (see below). Note that the matrices can be big and returning them within the return value can quickly exhaust the system's memory. In particular, if the block-wise calculation is necessary, it is usually impossible to return all matrices in the return value.

Description

Usage

Arguments

Value

Details

See Also