First, calls generateFilterValuesData.
Features are then selected via select
and val
.
filterFeatures(
task,
method = "randomForestSRC_importance",
fval = NULL,
perc = NULL,
abs = NULL,
threshold = NULL,
fun = NULL,
fun.args = NULL,
mandatory.feat = NULL,
select.method = NULL,
base.methods = NULL,
cache = FALSE,
...
)
Task.
(Task)
The task.
(character(1)
)
See listFilterMethods.
Default is “randomForestSRC_importance”.
(FilterValues)
Result of generateFilterValuesData.
If you pass this, the filter values in the object are used for feature
filtering.
method
and ...
are ignored then.
Default is NULL
and not used.
(numeric(1)
)
If set, select perc
*100 top scoring features.
perc = 1
means to select all features.Mutually exclusive with arguments
abs,
thresholdand
fun`.
(numeric(1)
)
If set, select abs
top scoring features.
Mutually exclusive with arguments perc
, threshold
and fun
.
(numeric(1)
)
If set, select features whose score exceeds threshold
.
Mutually exclusive with arguments perc
, abs
and fun
.
(function
)
If set, select features via a custom thresholding function, which must
return the number of top scoring features to select. Mutually exclusive
with arguments perc
, abs
and threshold
.
(any)
Arguments passed to the custom thresholding function.
(character)
Mandatory features which are always included regardless of their scores
If multiple methods are supplied in argument method
,
specify the method that is used for the final subsetting.
If method
is an ensemble filter, specify the base
filter methods which the ensemble method will use.
(character(1)
| logical)
Whether to use caching during filter value creation. See details.
(any)
Passed down to selected filter method.
If cache = TRUE
, the default mlr cache directory is used to cache
filter values. The directory is operating system dependent and can be
checked with getCacheDir()
.
The default cache can be cleared with deleteCacheDir()
.
Alternatively, a custom directory can be passed to store the cache.
Note that caching is not thread safe. It will work for parallel computation on many systems, but there is no guarantee.
Besides passing (multiple) simple filter methods you can also pass an
ensemble filter method (in a list). The ensemble method will use the simple
methods to calculate its ranking. See listFilterEnsembleMethods()
for
available ensemble methods.
Other filter:
generateFilterValuesData()
,
getFilteredFeatures()
,
listFilterEnsembleMethods()
,
listFilterMethods()
,
makeFilterEnsemble()
,
makeFilterWrapper()
,
makeFilter()
,
plotFilterValues()
# simple filter
filterFeatures(iris.task, method = "FSelectorRcpp_gain.ratio", abs = 2)
# ensemble filter
filterFeatures(iris.task, method = "E-min",
base.methods = c("FSelectorRcpp_gain.ratio",
"FSelectorRcpp_information.gain"), abs = 2)
Run the code above in your browser using DataLab