Given a multiData
structure, this function calculates the consensus number of present
(non-missing) data
for each variable (column) across the data sets, forms the consensus and for each group selects variables
whose consensus proportion of present data is at least selectFewestMissing
(see usage below).
selectFewestConsensusMissing(
mdx,
colID,
group,
minProportionPresent = 1,
consensusQuantile = 0,
verbose = 0,
...)
A multiData
structure. All sets must have the same columns.
Character vector of column identifiers. This must include all the column names from
mdx
, but can include other values as well. Its entries must be unique (no duplicates) and no
missing values are permitted.
Character vector whose components contain the group label (e.g. a character string) for
each entry of colID
. This vector must be of the same length as the vector colID
. In gene
expression applications, this vector could contain the gene symbol (or a co-expression module label).
A numeric value between 0 and 1 (logical values will be coerced to numeric). Denotes the minimum consensus fraction of present data in each column that will result in the column being retained.
A number between 0 and 1 giving the quantile probability for consensus calculation. 0 means the minimum value (true consensus) will be used.
Level of verbosity; 0 means silent, larger values will cause progress messages to be printed.
Other arguments that should be considered undocumented and subject to change.
A logical vector with one element per variable in mdx
, giving TRUE
for the retained
variables.
A 'consensus' of a vector (say 'x') is simply defined as the quantile with probability
consensusQuantile
of the vector x. This function calculates, for each variable in mdx
, its
proportion of present (i.e., non-NA and non-NaN)
values in each of the data sets in mdx
, and forms the consensus. Only
variables whose consensus proportion of present data is at least selectFewestMissing
are retained.