modulePreservation(
multiData,
multiColor,
dataIsExpr = TRUE,
networkType = "unsigned",
corFnc = "cor",
corOptions = "use = 'p'",
referenceNetworks = 1,
nPermutations = 100,
includekMEallInSummary = FALSE,
restrictSummaryForGeneralNetworks = TRUE,
calculateQvalue = FALSE,
randomSeed = 12345,
maxGoldModuleSize = 1000,
maxModuleSize = 1000,
quickCor = 1,
ccTupletSize = 2,
calculateCor.kIMall = FALSE,
calculateClusterCoeff = FALSE,
useInterpolation = FALSE,
checkData = TRUE,
greyName = NULL,
savePermutedStatistics = TRUE,
loadPermutedStatistics = FALSE,
permutedStatisticsFile = if (useInterpolation) "permutedStats-intrModules.RData"
else "permutedStats-actualModules.RData",
plotInterpolation = TRUE,
interpolationPlotFile = "modulePreservationInterpolationPlots.pdf",
discardInvalidOutput = TRUE,
verbose = 1, indent = 0)
checkSets
). A vector of
lists, one per set. Each set must contain a component data
that contains the expression or adjacency
damultiExpr
. The components must be named using the same names that are used in multiExpr
; these
names are used top match labels to expression data seTRUE
, multiData
will be interpreted as expression data; if
FALSE
, multiData
will be interpreted as adjacencies."unsigned"
,
"signed"
, "signed hybrid"
. See adjacency
.bicor
.
More generally, any function returning values betweecorFnc
. Use "use = 'p', method = 'spearman'"
to obtain Spearman correlation.multiColor
.TRUE
corresponds to published work.NULL
, the seed will not be set. If
non-NULL
and the random generator has been initialized prior to the function call, the latter's state
is saved and restored upon exitmaxModuleSize
will be reduced by randomly sampling maxModuleSize
genes.FALSE
, cor.kIMall will not be calculated, potentially saving significant amount
of time if the input adjacencies are large and contain many modules.goodSamplesGenesMS
for details.multiColor
contains character or numeric vectors,
respectively.useInterpolation
above), the function can optionally generate diagnostic plots that can be used to
assess whether the interpolation makes sense.dataIsExpr
is FALSE
and some of the output statistics cannot
be calculated. This option causes such statistics to be droppquality
, preservation
, referenceSeparability
,
and testSeparability
each contain 4 or 5 components: observed
contains observed values,
Z
contains the corresponding Z scores, log.p
contains base 10 logarithms of the p-values,
log.pBonf
contains base 10 logarithms of the Bonferoni corrected p-values, and optionally q
contains the associated q-values. The list accuracy
contains observed
, Z
, log.p
,
log.pBonf
, optionally q
,
and additional components observedOverlapCounts
and observedFisherPvalues
that contain the
observed matrices of overlap counts and Fisher test p-values. Each of the lists observed
, Z
, log.p
,
log.pBonf
, optionally q
, observedOverlapCounts
and observedFisherPvalues
is structured as a 2-level list where the outer components correspond to reference sets and the inner
components to tests sets. As an example, preservation$observed[[1]][[2]]
contains the density and
connectivity preservation statistics for the preservation of set 1 modules in set 2, that is set 1 is the
reference set and set 2 is the test set. preservation$observed[[1]][[2]]
is a data frame in which
each row corresponds to a module in the reference network 1 plus one row for the unassigned objects, and
one row for a "module" that contains randomly sampled objects and that represents a whole-network average.
Each column corresponds to a statistic as indicated by the column name.
multiExpr
. Reference sets must have their corresponding module assignment specified in
multiColor
; module assignment is optional for test sets. Individual expression sets and their module
labels are matched using names
of the corresponding components in multiExpr
and
multiColor
. For each reference-test pair, the function calculates module preservation statistics that
measure how well the modules of the reference set are preserved in the test set.
If the multiColor
also contains module assignment for the test set, the calculated statistics also
include cross-tabulation statistics that make use of the test module assignment.
For each reference-test pair, the function only uses genes (columns of the data
component of each
component of multiExpr
) that are in common between the reference and test set. Columns are matched by
column names, so column names must be valid.
In addition to preservation statistics, the function also calculates several statistics of module quality, that is measures of how well-defined modules are in the reference set. The quality statistics are calculated with respect to genes in common with with a test set; thus the function calculates a set of quality statistics for each reference-test pair. This may be somewhat counter-intuitive, but it allows a direct comparison of corresponding quality and preservation statistics.
The calculated p-values are determined from the Z scores of individual measures under assumption of normality. No p-value is calculated for the Zsummary measures. Bonferoni correction to the number of tested modules. Because the p-values for strongly preserved modules are often extremely low, the function reports natural logarithms (base e) of the p-values. However, q-values are reported untransformed since they are calculated that way in package qvalue.
Missing data are removed (but see quickCor
above).
adjacency
, blockwiseModules
; rudimentary cleaning in
goodSamplesGenesMS
; the WGCNA implementation of correlation in cor
.