Learn R Programming

dataPreparation (version 0.4.3)

unFactor: Unfactor factor with too many values

Description

To unfactorize all columns that have more than a given amount of various values. This function will be usefull after using some reading functions that put every string as factor.

Usage

unFactor(dataSet, cols = "auto", n_unfactor = 53, verbose = TRUE)

Arguments

dataSet

Matrix, data.frame or data.table

cols

List of column(s) name(s) of dataSet to look into. To check all all columns, set it to "auto". (characters, default to "auto")

n_unfactor

Number of max element in a factor (numeric, default to 53)

verbose

Should the algorithm talk? (logical, default to TRUE)

Value

Same dataSet (as a data.table) with less factor columns.

Details

If a factor has (strictly) more than n_unfactor values it is unfactored. It is recommended to use findAndTransformNumerics and findAndTransformDates after this function. If n_unfactor is set to -1, nothing will be performed. If there are a lot of column that have been transformed, you might want to look at the documentation of your data reader in order to stop transforming everything into a factor.

Examples

Run this code
# NOT RUN {
# Let's build a dataSet
dataSet <- data.frame(true_factor = factor(rep(c(1,2), 13)),
                      false_factor = factor(LETTERS))
                      
# Let's un factorize all factor that have more than 5 different values
dataSet <- unFactor(dataSet, n_unfactor = 5)
sapply(dataSet, class)
# Let's un factorize all factor that have more than 5 different values
dataSet <- unFactor(dataSet, n_unfactor = 0)
sapply(dataSet, class)

# }

Run the code above in your browser using DataLab