This function calculates consensus topological overlap in a hierarchical manner.
hierarchicalConsensusTOM(
# ... information needed to calculate individual TOMs
multiExpr,
multiWeights = NULL, # Data checking options
checkMissingData = TRUE,
# Blocking options
blocks = NULL,
maxBlockSize = 20000,
blockSizePenaltyPower = 5,
nPreclusteringCenters = NULL,
randomSeed = 12345,
# Network construction options
networkOptions,
# Save individual TOMs?
keepIndividualTOMs = TRUE,
individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
# ... or information about individual (more precisely, input) TOMs
individualTOMInfo = NULL,
# Consensus calculation options
consensusTree,
useBlocks = NULL,
# Save calibrated TOMs?
saveCalibratedIndividualTOMs = FALSE,
calibratedIndividualTOMFilePattern = "calibratedIndividualTOM-Set%s-Block%b.RData",
# Return options
saveConsensusTOM = TRUE,
consensusTOMFilePattern = "consensusTOM-%a-Block%b.RData",
getCalibrationSamples = FALSE,
# Return the intermediate results as well?
keepIntermediateResults = saveConsensusTOM,
# Internal handling of TOMs
useDiskCache = NULL,
chunkSize = NULL,
cacheDir = ".",
cacheBase = ".blockConsModsCache",
# Behavior
collectGarbage = TRUE,
verbose = 1,
indent = 0)
A list that contains the output of hierarchicalConsensusCalculation
and two extra components:
A copy of the input individualTOMInfo
if it was non-NULL
, or the
result of individualTOMs
.
A copy of the input consensusTree
.
Expression data in the multi-set format (see checkSets
). A vector of
lists, one per set. Each set must contain a component data
that contains the expression data, with
rows corresponding to samples and columns to genes or probes.
optional observation weights in the same format (and dimensions) as multiExpr
.
These weights are used for correlation calculations with data in multiExpr
.
Logical: should data be checked for excessive numbers of missing entries in genes and samples, and for genes with zero variance? See details.
Optional specification of blocks in which hierarchical clustering and module detection
should be performed. If given, must be a numeric vector with one entry per gene
of multiExpr
giving the number of the block to which the corresponding gene belongs.
Integer giving maximum block size for module detection. Ignored if blocks
above is non-NULL. Otherwise, if the number of genes in datExpr
exceeds maxBlockSize
, genes
will be pre-clustered into blocks whose size should not exceed maxBlockSize
.
Number specifying how strongly blocks should be penalized for exceeding the
maximum size. Set to a lrge number or Inf
if not exceeding maximum block size is very important.
Number of centers to be used in the preclustering. Defaults to smaller of
nGenes/20
and 100*nGenes/maxBlockSize
, where nGenes
is the nunber of genes (variables)
in multiExpr
.
Integer to be used as seed for the random number generator before the function
starts. If a current seed exists, it is saved and restored upon exit. If NULL
is given, the
function will not save and restore the seed.
A single list of class NetworkOptions
giving options for network calculation for all of the
networks, or a multiData
structure containing one such list for each input data set.
Logical: should individual TOMs be retained after the calculation is finished?
Character string giving the file names to save individual TOMs into. The
following tags should be used to make the file names unique for each set and block: %s
will be
replaced by the set number; %N
will be replaced by the set name (taken from names(multiExpr)
)
if it exists, otherwise by set number; %b
will be replaced by the block number. If the file names
turn out to be non-unique, an error will be generated.
A list, typically returned by individualTOMs
, containing information about the topological
overlap matrices in the individual data sets in multiExpr
. See the output of
individualTOMs
for
details on the content of the list.
A list specifying the consensus calculation. See details.
Optional vector giving the blocks that should be used for the calcualtions. If NULL
, all
all blocks will be used.
Logical: should the calibrated individual TOMs be saved?
Specification of file names in which calibrated individual TOMs should be saved.
Logical: should the consensus TOM be saved to disk?
Character string giving the file names to save consensus TOMs into. The
following tags should be used to make the file names unique for each set and block: %s
will be
replaced by the set number; %N
will be replaced by the set name (taken from names(multiExpr)
)
if it exists, otherwise by set number; %b
will be replaced by the block number. If the file names
turn out to be non-unique, an error will be generated.
Logical: should the sampled values used for network calibration be returned?
Logical: should intermediate consensus TOMs be saved as well?
Logical: should disk cache be used for consensus calculations? The disk cache can be used to store chunks of
calibrated data that are small enough to fit one chunk from each set into memory (blocks may be small enough
to fit one block of one set into memory, but not small enough to fit one block from all sets in a consensus
calculation into memory at the same time). Using disk cache is slower but lessens the memory footprint of
the calculation.
As a general guide, if individual data are split into blocks, we
recommend setting this argument to TRUE
. If this argument is NULL
, the function will decide
whether to use disk cache based on the number of sets and block sizes.
network similarities are saved in smaller chunks of size chunkSize
. If NULL
,
an appropriate chunk size will be determined from an estimate of available memory. Note that if the chunk size
is greater than the memory required for storing intemediate results, disk cache use will automatically be
disabled.
character string containing the directory into which cache files should be written. The user should make sure that the filesystem has enough free space to hold the cache files which can get quite large.
character string containing the desired name for the cache files. The actual file
names will consists of cacheBase
and a suffix to make the file names unique.
Logical: should garbage be collected after memory-intensive operations?
integer level of verbosity. Zero means silent, higher values make the output progressively more and more verbose.
indentation for diagnostic messages. Zero means no indentation, each unit adds two spaces.
Peter Langfelder
This function is essentially a wrapper for hierarchicalConsensusCalculation
, with a few
additional operations specific to calculations of topological overlaps.
hierarchicalConsensusCalculation
for the actual hierarchical consensus calculation;
individualTOMs
for the calculation of individual TOMs in a format suitable for consensus
calculation.