Constant features can lead to errors in some models and obviously provide no information in the training set that can be learned from. With the argument “perc”, there is a possibility to also remove features for which less than “perc” percent of the observations differ from the mode value.
removeConstantFeatures(
obj,
perc = 0,
dont.rm = character(0L),
na.ignore = FALSE,
tol = .Machine$double.eps^0.5,
show.info = getMlrOption("show.info")
)
(data.frame | Task) Input data.
(numeric(1)
)
The percentage of a feature values in [0, 1) that must differ from the mode value.
Default is 0, which means only constant features with exactly one observed level are removed.
(character) Names of the columns which must not be deleted. Default is no columns.
(logical(1)
)
Should NAs be ignored in the percentage calculation?
(Or should they be treated as a single, extra level in the percentage calculation?)
Note that if the feature has only missing values, it is always removed.
Default is FALSE
.
(numeric(1)
)
Numerical tolerance to treat two numbers as equal.
Variables stored as double
will get rounded accordingly before computing the mode.
Default is sqrt(.Maschine$double.eps)
.
(logical(1)
)
Print verbose output on console?
Default is set via configureMlr.
data.frame | Task. Same type as obj
.
Other eda_and_preprocess:
capLargeValues()
,
createDummyFeatures()
,
dropFeatures()
,
mergeSmallFactorLevels()
,
normalizeFeatures()
,
summarizeColumns()
,
summarizeLevels()