Set NA as the reference level for factor variables and do imputation on missing values for numeric variables. This is useful to build model matrices for regularized regression, and for dealing with missing values, as in Taddy 2019.
Usage
naref(x, impute=FALSE, pzero=0.5)
Value
A data frame where the factor and character columns have been converted to factors with reference level NA, and if impute=TRUE the missing values in numeric columns have been imputed and a flag for missingness has been added. See details.
Arguments
x
A data frame.
impute
Logical, whether to impute missing values in numeric columns.
pzero
If impute==TRUE, then if more than pzero of the values in a column are zero do zero imputation, else do mean imputation.
For every factor or character column in x, naref sets NA as the reference level for a factor variable. Columns coded as character class are first converted to factors via Rfactor(x). If impute=TRUE then the numeric columns are converted to two columns, one appended .x that contains imputed values and another appended .miss which is a binary variable indicating whether the original value was missing. Numeric columns are returned without change if impute=FALSE or if they do not contain any missing values.
References
Matt Taddy, 2019. "Business Data Science", McGraw-Hill