selectFewestConsensusMissing: Select columns with the lowest consensus number of missing data

Description

Given a multiData structure, this function calculates the consensus number of present (non-missing) data for each variable (column) across the data sets, forms the consensus and for each group selects variables whose consensus proportion of present data is at least selectFewestMissing (see usage below).

Usage

selectFewestConsensusMissing(
    mdx, 
    colID, 
    group, 
    minProportionPresent = 1, 
    consensusQuantile = 0, 
    verbose = 0,
    ...)

Arguments

mdx

A multiData structure. All sets must have the same columns.

colID

Character vector of column identifiers. This must include all the column names from mdx, but can include other values as well. Its entries must be unique (no duplicates) and no missing values are permitted.

group

Character vector whose components contain the group label (e.g. a character string) for each entry of colID. This vector must be of the same length as the vector colID. In gene expression applications, this vector could contain the gene symbol (or a co-expression module label).

minProportionPresent

A numeric value between 0 and 1 (logical values will be coerced to numeric). Denotes the minimum consensus fraction of present data in each column that will result in the column being retained.

consensusQuantile

A number between 0 and 1 giving the quantile probability for consensus calculation. 0 means the minimum value (true consensus) will be used.

verbose

Level of verbosity; 0 means silent, larger values will cause progress messages to be printed.

...

Other arguments that should be considered undocumented and subject to change.

Value

A logical vector with one element per variable in mdx, giving TRUE for the retained variables.

Details

A 'consensus' of a vector (say 'x') is simply defined as the quantile with probability consensusQuantile of the vector x. This function calculates, for each variable in mdx, its proportion of present (i.e., non-NA and non-NaN) values in each of the data sets in mdx, and forms the consensus. Only variables whose consensus proportion of present data is at least selectFewestMissing are retained.

Description

Usage

Arguments

Value

Details

See Also