Prepare data for data analysis
preprocess(x, y = NULL, completeCases = FALSE,
removeCases.thres = NULL, removeFeatures.thres = NULL,
impute = FALSE, impute.type = c("missForest", "rfImpute",
"meanMode"), impute.niter = 10, impute.ntree = 500,
missForest.parallelize = c("no", "variables", "forests"),
impute.discrete = getMode, impute.numeric = mean,
integer2factor = FALSE, integer2numeric = FALSE,
logical2factor = FALSE, logical2numeric = FALSE,
numeric2factor = FALSE, numeric2factor.levels = NULL,
character2factor = FALSE, nonzeroFactors = FALSE, scale = FALSE,
center = FALSE, removeConstant = TRUE, oneHot = FALSE,
exclude = NULL, verbose = TRUE)
Input
Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE
Float: Remove cases with >= to this fraction of missing features. Default = NULL
Float: Remove features with missing values in >= to this fraction of cases. Default = NULL
Logical: If TRUE, impute missing cases. See impute.discrete
and
impute.numeric
for how
String: How to impute data: "missForest" uses the package of the same name to impute by iterative
random forest regression. "rfImpute" uses randomForest::rfImpute
(see its documentation), "meanMode" will use
mean and mode by default or any custom function defined in impute.discrete
and impute.numeric
Function that returns single value: How to impute discrete variables for
impute.type = "meanMode"
. Default = getMode
Function that returns single value: How to impute continuous variables for
impute.type = "meanMode"
.
Default = mean
Logical: If TRUE, convert all integers to factors
Logical: If TRUE, convert all integers to numeric (will only work
if integer2factor = FALSE
)
Logical: Shift factor values to exclude zeros. Default = FALSE
Logical: If TRUE, scale columns of x
Logical: If TRUE, center columns of x
Logical: If TRUE, remove all columns with zero variance. Default = TRUE
Logical: If TRUE, write messages to console. Default = TRUE
By default, only removes constant features, everything else can be specified.
Order of operations: * completeCases * removeCases.thres * removeFeatures.thres * integer2factor * nonzeroFactors * impute * scale/center * removeConstant