Learn R Programming

rtemis (version 0.79)

preprocess: Data preprocessing

Description

Prepare data for data analysis

Usage

preprocess(x, y = NULL, completeCases = FALSE,
  removeCases.thres = NULL, removeFeatures.thres = NULL,
  impute = FALSE, impute.type = c("missForest", "rfImpute",
  "meanMode"), impute.niter = 10, impute.ntree = 500,
  missForest.parallelize = c("no", "variables", "forests"),
  impute.discrete = getMode, impute.numeric = mean,
  integer2factor = FALSE, integer2numeric = FALSE,
  logical2factor = FALSE, logical2numeric = FALSE,
  numeric2factor = FALSE, numeric2factor.levels = NULL,
  character2factor = FALSE, nonzeroFactors = FALSE, scale = FALSE,
  center = FALSE, removeConstant = TRUE, oneHot = FALSE,
  exclude = NULL, verbose = TRUE)

Arguments

x

Input

completeCases

Logical: If TRUE, only retain complete cases (no missing data). Default = FALSE

removeCases.thres

Float: Remove cases with >= to this fraction of missing features. Default = NULL

removeFeatures.thres

Float: Remove features with missing values in >= to this fraction of cases. Default = NULL

impute

Logical: If TRUE, impute missing cases. See impute.discrete and impute.numeric for how

impute.type

String: How to impute data: "missForest" uses the package of the same name to impute by iterative random forest regression. "rfImpute" uses randomForest::rfImpute (see its documentation), "meanMode" will use mean and mode by default or any custom function defined in impute.discrete and impute.numeric

impute.discrete

Function that returns single value: How to impute discrete variables for impute.type = "meanMode". Default = getMode

impute.numeric

Function that returns single value: How to impute continuous variables for impute.type = "meanMode". Default = mean

integer2factor

Logical: If TRUE, convert all integers to factors

integer2numeric

Logical: If TRUE, convert all integers to numeric (will only work if integer2factor = FALSE)

nonzeroFactors

Logical: Shift factor values to exclude zeros. Default = FALSE

scale

Logical: If TRUE, scale columns of x

center

Logical: If TRUE, center columns of x

removeConstant

Logical: If TRUE, remove all columns with zero variance. Default = TRUE

verbose

Logical: If TRUE, write messages to console. Default = TRUE

Details

By default, only removes constant features, everything else can be specified.

Order of operations: * completeCases * removeCases.thres * removeFeatures.thres * integer2factor * nonzeroFactors * impute * scale/center * removeConstant