selectFewestConsensusMissing: Select columns with the lowest consensus number of missing data

Description

Given a multiData structure, this function calculates the consensus number of present (non-missing) data for each variable (column) across the data sets, forms the consensus and for each group selects variables whose consensus proportion of present data is at least selectFewestMissing (see usage below).

Usage

selectFewestConsensusMissing(
    mdx, 
    colID, 
    group, 
    minProportionPresent = 1, 
    consensusQuantile = 0, 
    verbose = 0,
    ...)

Value

A logical vector with one element per variable in mdx, giving TRUE for the retained variables.

Arguments

mdx: A multiData structure. All sets must have the same columns.
colID: Character vector of column identifiers. This must include all the column names from mdx, but can include other values as well. Its entries must be unique (no duplicates) and no missing values are permitted.
group: Character vector whose components contain the group label (e.g. a character string) for each entry of colID. This vector must be of the same length as the vector colID. In gene expression applications, this vector could contain the gene symbol (or a co-expression module label).
minProportionPresent: A numeric value between 0 and 1 (logical values will be coerced to numeric). Denotes the minimum consensus fraction of present data in each column that will result in the column being retained.
consensusQuantile: A number between 0 and 1 giving the quantile probability for consensus calculation. 0 means the minimum value (true consensus) will be used.
verbose: Level of verbosity; 0 means silent, larger values will cause progress messages to be printed.
...: Other arguments that should be considered undocumented and subject to change.

Author

Jeremy Miller and Peter Langfelder

Details