MRMR: Minimum redundancy maximal relevancy filter

Description

The method starts with a feature of a maximal mutual information with the decision $Y$. Then, it greedily adds feature $X$ with a maximal value of the following criterion: $$J(X)=I(X;Y)-\frac{1}{|S|}\sum_{W\in S} I(X;W),$$ where $S$ is the set of already selected features.

Usage

MRMR(X, Y, k = if (positive) ncol(X) else 3, positive = FALSE, threads = 0)

Arguments

Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details). Single vector will be interpreted as a data.frame with one column. NAs are not allowed.

Decision attribute; should be given as a factor, but other options are accepted, exactly like for attributes. NAs are not allowed.

Number of attributes to select. Must not exceed ncol(X).

positive

If true, algorithm won't return features with negative scores (i.e., with redundancy term higher than the relevance therm). In that case, k controls the maximal number of returned features, and is set to `ncol(X)` by default.

threads

Number of threads to use; default value, 0, means all available to OpenMP.

Value

A list with two elements: selection, a vector of indices of the selected features in the selection order, and score, a vector of corresponding feature scores. Names of both vectors will correspond to the names of features in X. Both vectors will be at most of a length k, as the selection may stop sooner, even during initial selection, in which case both vectors will be empty.

References

"Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy" H. Peng et al. IEEE Pattern Analysis and Machine Intelligence (PAMI) (2005)

Examples

Run this code

# NOT RUN {
data(MadelonD)
MRMR(MadelonD$X,MadelonD$Y,20)
# }

Run the code above in your browser using DataLab