hierarchicalConsensusTOM: Calculation of hierarchical consensus topological overlap matrix

Description

This function calculates consensus topological overlap in a hierarchical manner.

Usage

hierarchicalConsensusTOM(
      # ... information needed to calculate individual TOMs
      multiExpr,
      multiWeights = NULL,
      # Data checking options
      checkMissingData = TRUE,
      # Blocking options
      blocks = NULL,
      maxBlockSize = 20000,
      blockSizePenaltyPower = 5,
      nPreclusteringCenters = NULL,
      randomSeed = 12345,
      # Network construction options
      networkOptions,
      # Save individual TOMs?
      keepIndividualTOMs = TRUE,
      individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
      # ... or information about individual (more precisely, input) TOMs
      individualTOMInfo = NULL,
      # Consensus calculation options 
      consensusTree,
      useBlocks = NULL,
      # Save calibrated TOMs?
      saveCalibratedIndividualTOMs = FALSE,
      calibratedIndividualTOMFilePattern = "calibratedIndividualTOM-Set%s-Block%b.RData",
      # Return options
      saveConsensusTOM = TRUE,
      consensusTOMFilePattern = "consensusTOM-%a-Block%b.RData",
      getCalibrationSamples = FALSE,
      # Return the intermediate results as well?  
      keepIntermediateResults = saveConsensusTOM,
      # Internal handling of TOMs
      useDiskCache = NULL, 
      chunkSize = NULL,
      cacheDir = ".",
      cacheBase = ".blockConsModsCache",
      # Behavior
      collectGarbage = TRUE,
      verbose = 1,
      indent = 0)

Value

A list that contains the output of hierarchicalConsensusCalculation and two extra components:

individualTOMInfo: A copy of the input individualTOMInfo if it was non-NULL, or the result of individualTOMs.
consensusTree: A copy of the input consensusTree.

Arguments

multiExpr: Expression data in the multi-set format (see checkSets). A vector of lists, one per set. Each set must contain a component data that contains the expression data, with rows corresponding to samples and columns to genes or probes.
multiWeights: optional observation weights in the same format (and dimensions) as multiExpr. These weights are used for correlation calculations with data in multiExpr.
checkMissingData: Logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
blocks: Optional specification of blocks in which hierarchical clustering and module detection should be performed. If given, must be a numeric vector with one entry per gene of multiExpr giving the number of the block to which the corresponding gene belongs.
maxBlockSize: Integer giving maximum block size for module detection. Ignored if blocks above is non-NULL. Otherwise, if the number of genes in datExpr exceeds maxBlockSize, genes will be pre-clustered into blocks whose size should not exceed maxBlockSize.
blockSizePenaltyPower: Number specifying how strongly blocks should be penalized for exceeding the maximum size. Set to a lrge number or Inf if not exceeding maximum block size is very important.
nPreclusteringCenters: Number of centers to be used in the preclustering. Defaults to smaller of nGenes/20 and 100*nGenes/maxBlockSize, where nGenes is the nunber of genes (variables) in multiExpr.
randomSeed: Integer to be used as seed for the random number generator before the function starts. If a current seed exists, it is saved and restored upon exit. If NULL is given, the function will not save and restore the seed.
networkOptions: A single list of class NetworkOptions giving options for network calculation for all of the networks, or a multiData structure containing one such list for each input data set.
keepIndividualTOMs: Logical: should individual TOMs be retained after the calculation is finished?
individualTOMFileNames: Character string giving the file names to save individual TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
individualTOMInfo: A list, typically returned by individualTOMs, containing information about the topological overlap matrices in the individual data sets in multiExpr. See the output of individualTOMs for details on the content of the list.
consensusTree: A list specifying the consensus calculation. See details.
useBlocks: Optional vector giving the blocks that should be used for the calcualtions. If NULL, all all blocks will be used.
saveCalibratedIndividualTOMs: Logical: should the calibrated individual TOMs be saved?
calibratedIndividualTOMFilePattern: Specification of file names in which calibrated individual TOMs should be saved.
saveConsensusTOM: Logical: should the consensus TOM be saved to disk?
consensusTOMFilePattern: Character string giving the file names to save consensus TOMs into. The following tags should be used to make the file names unique for each set and block: %s will be replaced by the set number; %N will be replaced by the set name (taken from names(multiExpr)) if it exists, otherwise by set number; %b will be replaced by the block number. If the file names turn out to be non-unique, an error will be generated.
getCalibrationSamples: Logical: should the sampled values used for network calibration be returned?
keepIntermediateResults: Logical: should intermediate consensus TOMs be saved as well?
useDiskCache: Logical: should disk cache be used for consensus calculations? The disk cache can be used to store chunks of calibrated data that are small enough to fit one chunk from each set into memory (blocks may be small enough to fit one block of one set into memory, but not small enough to fit one block from all sets in a consensus calculation into memory at the same time). Using disk cache is slower but lessens the memory footprint of the calculation. As a general guide, if individual data are split into blocks, we recommend setting this argument to TRUE. If this argument is NULL, the function will decide whether to use disk cache based on the number of sets and block sizes.
chunkSize: network similarities are saved in smaller chunks of size chunkSize. If NULL, an appropriate chunk size will be determined from an estimate of available memory. Note that if the chunk size is greater than the memory required for storing intemediate results, disk cache use will automatically be disabled.
cacheDir: character string containing the directory into which cache files should be written. The user should make sure that the filesystem has enough free space to hold the cache files which can get quite large.
cacheBase: character string containing the desired name for the cache files. The actual file names will consists of cacheBase and a suffix to make the file names unique.
collectGarbage: Logical: should garbage be collected after memory-intensive operations?
verbose: integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
indent: indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.

Author

Peter Langfelder

Details

This function is essentially a wrapper for hierarchicalConsensusCalculation, with a few additional operations specific to calculations of topological overlaps.

Description

Usage

Value

Arguments

Author

Details

See Also