filterFeatures: Filter features by thresholding filter values.

Description

First, calls generateFilterValuesData. Features are then selected via select and val.

Usage

filterFeatures(
  task,
  method = "randomForestSRC_importance",
  fval = NULL,
  perc = NULL,
  abs = NULL,
  threshold = NULL,
  fun = NULL,
  fun.args = NULL,
  mandatory.feat = NULL,
  select.method = NULL,
  base.methods = NULL,
  cache = FALSE,
  ...
)

Value

Task.

Arguments

task: (Task)
The task.
method: (character(1))
See listFilterMethods. Default is “randomForestSRC_importance”.
fval: (FilterValues)
Result of generateFilterValuesData. If you pass this, the filter values in the object are used for feature filtering. method and ... are ignored then. Default is NULL and not used.
perc: (numeric(1))
If set, select perc*100 top scoring features. perc = 1 means to select all features.Mutually exclusive with argumentsabs, thresholdandfun`.
abs: (numeric(1))
If set, select abs top scoring features. Mutually exclusive with arguments perc, threshold and fun.
threshold: (numeric(1))
If set, select features whose score exceeds threshold. Mutually exclusive with arguments perc, abs and fun.
fun: (function)
If set, select features via a custom thresholding function, which must return the number of top scoring features to select. Mutually exclusive with arguments perc, abs and threshold.
fun.args: (any)
Arguments passed to the custom thresholding function.
mandatory.feat: (character)
Mandatory features which are always included regardless of their scores
select.method: If multiple methods are supplied in argument method, specify the method that is used for the final subsetting.
base.methods: If method is an ensemble filter, specify the base filter methods which the ensemble method will use.
cache: (character(1) | logical)
Whether to use caching during filter value creation. See details.
...: (any)
Passed down to selected filter method.

Caching

If cache = TRUE, the default mlr cache directory is used to cache filter values. The directory is operating system dependent and can be checked with getCacheDir().
The default cache can be cleared with deleteCacheDir(). Alternatively, a custom directory can be passed to store the cache.

Note that caching is not thread safe. It will work for parallel computation on many systems, but there is no guarantee.

Simple and ensemble filters

Besides passing (multiple) simple filter methods you can also pass an ensemble filter method (in a list). The ensemble method will use the simple methods to calculate its ranking. See listFilterEnsembleMethods() for available ensemble methods.

Examples

Run this code

# simple filter
filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2)
# ensemble filter
filterFeatures(iris.task, method = "E-min",
  base.methods = c("FSelectorRcpp_gain.ratio",
    "FSelectorRcpp_information.gain"), abs = 2)

Run the code above in your browser using DataLab