consensusTOM(
# Supply either ...
# ... information needed to calculate individual TOMs
multiExpr, # Data checking options
checkMissingData = TRUE,
# Blocking options
blocks = NULL,
maxBlockSize = 5000,
randomSeed = 12345,
# Network construction arguments: correlation options
corType = "pearson",
maxPOutliers = 1,
quickCor = 0,
pearsonFallback = "individual",
cosineCorrelation = FALSE,
# Adjacency function options
power = 6,
networkType = "unsigned",
checkPower = TRUE,
# Topological overlap options
TOMType = "unsigned",
TOMDenom = "min",
# Save individual TOMs?
saveIndividualTOMs = TRUE,
individualTOMFileNames = "individualTOM-Set%s-Block%b.RData",
# ... or individual TOM information
individualTOMInfo = NULL,
useIndivTOMSubset = NULL,
##### Consensus calculation options
useBlocks = NULL,
networkCalibration = c("single quantile", "full quantile", "none"),
# Save calibrated TOMs?
saveCalibratedIndividualTOMs = FALSE,
calibratedIndividualTOMFilePattern = "calibratedIndividualTOM-Set%s-Block%b.RData",
# Simple quantile calibration options
calibrationQuantile = 0.95,
sampleForCalibration = TRUE, sampleForCalibrationFactor = 1000,
getNetworkCalibrationSamples = FALSE,
# Consensus definition
consensusQuantile = 0,
useMean = FALSE,
setWeights = NULL,
# Return options
saveConsensusTOMs = TRUE,
consensusTOMFileNames = "consensusTOM-Block%b.RData",
returnTOMs = FALSE,
# Internal handling of TOMs
useDiskCache = TRUE, chunkSize = NULL,
cacheDir = ".",
cacheBase = ".blockConsModsCache",
nThreads = 1,
# Diagnostic messages
verbose = 1,
indent = 0)
checkSets
). A vector of
lists, one per set. Each set must contain a component data
that contains the expression data, with
rows corresponding to smultiExpr
giving the number of the block to which the corresponding geneblocks
above is non-NULL. Otherwise, if the number of genes in datExpr
exceeds maxBlockSize
, genes
will be pre-clustered into blocks whose size shoulNULL
is given, the
function will not save and restore the seed."pearson"
and "bicor"
, corresponding to Pearson and bidweight
midcorrelation, respectively. Missing values are handled using thecorType=="bicor"
. Specifies the maximum percentile of data
that can be considered outliers on either
side of the median separately. For each side of the median, if
higher percentile than maxPOutliers
is considered a"none", "individual", "all"
. If set to
"none"
, zero mad will resul"unsigned"
,
"signed"
, "signed hybrid"
. See adjacency
.power
? If
you would like to experiment with unusual powers, set the argument to FALSE
and proceed with
caution."none"
, "unsigned"
, "signed"
. If "none"
, adjacency
will be used for clustering. If "unsigned"
, the standard TOM will be used (more generally, TOM
function will receive the adjacency a"min"
giving the standard TOM described in Zhang and Horvath (2005), and "mean"
in which
the min
function in the denominator is replaced%s
will be
replaced by the set number; %N
will be replaced by the set nablockwiseIndividualTOMs
. If not given, appropriate topological overlaps will be
calculated usiindividualTOMInfo
is given, this argument allows to only select a subset
of the individual set networks contained in individualTOMInfo
. It should be a numeric vector giving the
indices of the individual sets to be used. Note tnetworkCalibration
is "single quantile"
,
topological overlaps (or adjacencies if
TOMs are not computed) will be scaled such that their calibrationQuantile
quantiles will agree.TRUE
, calibration quantiles will be determined from a sample of network
similarities. Note that using all data can double the memory footprint of the function and the function
may fail.1/calibrationQuantile * sampleForCalibrationFactor
. Should be set well above 1 to ensure accuracy of the
sampled quantile.useMean
above is TRUE
.%b
will be replaced by the block number. If the resulting file
names are non-unique (for example, because the user gives a file name withoutchunkSize
chunkSize
. If NULL
,
an appropriate chunk size will be determined from an estimate of available memory. Note that if the chunk size
is greater than the memory required for scacheBase
and a suffix to make the file names unique.returnTOMs
is TRUE
. A list containing consensus TOM
for each block, stored as a distance structure.saveConsensusTOMs
is TRUE
. A vector of file names, one for
each block, in which the TOM for the corresponding block is stored. TOM is saved as a distance structure to
save space.individualTOMInfo
if given; otherwise the result of calling blockwiseIndividualTOMs
. See blockwiseIndividualTOMs
for
details.useIndivTOMSubset
.goodSamplesGenesMS
for details.goodSamplesGenes
above.saveCalibratedIndividualTOMs
.saveCalibratedIndividualTOMs
is TRUE
, this
component will contain the file names of calibrated individual networks. The file names are arranged in a
character matrix with each row corresponding to one input set and each column to one block.getNetworkCalibrationSamples
is TRUE
, a list with one
component per block. Each component is in turn a list with two components: sampleIndex
is a vector
contain the indices of the TOM samples (the indices refer to a flattened distance structure), and
TOMSamples
is a matrix of TOM samples with each row corresponding to a sample in sampleIndex
,
and each column to one input set.consensusQuantile
.consensusQuantile
equals zero, originCount
contains the number of
entries in the consensus TOM that come from each set (i.e., the number of times the TOM in the set was the
minimum). When consensusQuantile
is not zero or the "mean" consensus is used, this vector contains
zeroes.NA
in entries
corresponding to filtered-out samples.If blocks
is not given and
the number of genes exceeds maxBlockSize
, genes are pre-clustered into blocks using the function
consensusProjectiveKMeans
; otherwise all genes are treated in a single block.
For each block of genes, the network is constructed and (if requested) topological overlap is calculated in each set. To minimize memory usage, calculated topological overlaps are optionally saved to disk in chunks until they are needed again for the calculation of the consensus network topological overlap.
Before calculation of the consensus Topological Overlap, individual TOMs are optionally calibrated. Calibration methods include single quantile scaling and full quantile normalization.
Single quantile
scaling raises individual TOM in sets 2,3,... to a power such that the quantiles given by
calibrationQuantile
agree with the quantile in set 1. Since the high TOMs are usually the most
important
for module identification, the value of calibrationQuantile
is close to (but not equal) 1. To speed up
quantile calculation, the quantiles can be determined on a randomly-chosen component subset of the TOM
matrices.
Full quantile normalization, implemented in normalize.quantiles
, adjusts the
TOM matrices such that all quantiles equal each other (and equal to the quantiles of the component-wise
average of the individual TOM matrices).
Note that network calibration is performed separately in each block, i.e., the normalizing transformation may differ between blocks. This is necessary to avoid manipulating a full TOM in memory.
The consensus TOM is calculated as the component-wise consensusQuantile
quantile of the individual
(set) TOMs; that is, for each gene pair (TOM entry), the consensusQuantile
quantile across all input
sets. Alternatively, one can also use (weighted) component-wise mean across all imput data sets.
If requested, the consensus topological overlaps are saved to disk for later use.
Bin Zhang and Steve Horvath (2005) "A General Framework for Weighted Gene Co-Expression Network Analysis", Statistical Applications in Genetics and Molecular Biology: Vol. 4: No. 1, Article 17 PMID: 16646834
The original reference for the WGCNA package is
Langfelder P, Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008, 9:559 PMID: 19114008
For consensus modules, see
Langfelder P, Horvath S (2007) "Eigengene networks for studying the relationships between co-expression modules", BMC Systems Biology 2007, 1:54
This function uses quantile normalization described, for example, in
Bolstad BM1, Irizarry RA, Astrand M, Speed TP (2003) "A comparison of normalization methods for high density oligonucleotide array data based on variance and bias", Bioinformatics. 2003 Jan 22;19(2):1
blockwiseIndividualTOMs
for calculation of topological overlaps across multiple sets.