impute
performs the imputation on a data set and returns,
alongside with the imputed data set, an “ImputationDesc” object
which can contain “learned” coefficients and helpful data.
It can then be passed together with a new data set to reimpute
. The imputation techniques can be specified for certain features or for feature classes,
see function arguments. You can either provide an arbitrary object, use a built-in imputation method listed
under imputations
or create one yourself using makeImputeMethod
.impute(obj, target = character(0L), classes = list(), cols = list(),
dummy.classes = character(0L), dummy.cols = character(0L),
dummy.type = "factor", force.dummies = FALSE, impute.new.levels = TRUE,
recode.factor.levels = TRUE)
data.frame
| Task
]
Input data.character
]
Name of the column(s) specifying the response.
Default is character(0)
.named list
]
Named list containing imputation techniques for classes of columns.
E.g. list(numeric = imputeMedian())
.named list
]
Named list containing names of imputation methods to impute missing values
in the data column referenced by the list element's name. Overrules imputation set via
classes
.character
]
Classes of columns to create dummy columns for.
Default is character(0)
.character
]
Column names to create dummy columns (containing binary missing indicator) for.
Default is character(0)
.character(1)
]
How dummy columns are encoded. Either as 0/1 with type “numeric”
or as “factor”.
Default is “factor”.logical(1)
]
Force dummy creation even if the respective data column does not
contain any NAs. Note that (a) most learners will complain about
constant columns created this way but (b) your feature set might
be stochastic if you turn this off.
Default is FALSE
.logical(1)
]
If new, unencountered factor level occur during reimputation,
should these be handled as NAs and then be imputed the same way?
Default is TRUE
.logical(1)
]
Recode factor levels after reimputation, so they match the respective element of
lvls
(in the description object) and therefore match the levels of the
feature factor in the training data after imputation?.
Default is TRUE
.list
]
data.frame
]ImputationDesc
]character
]character
]data
,
excluding target
.named list
]named list
]named list
]logical(1)
]logical(1)
]imputations
,
makeImputeMethod
,
makeImputeWrapper
, reimpute
df = data.frame(x = c(1, 1, NA), y = factor(c("a", "a", "b")), z = 1:3)
imputed = impute(df, target = character(0), cols = list(x = 99, y = imputeMode()))
print(imputed$data)
reimpute(data.frame(x = NA), imputed$desc)
Run the code above in your browser using DataLab