build_encoding: Compute encoding

Description

Build a list of one hot encoding for each cols.

Usage

build_encoding(dataSet, cols = "auto", verbose = TRUE, min_frequency = 0, ...)

Arguments

dataSet

Matrix, data.frame or data.table

cols

List of numeric column(s) name(s) of dataSet to transform. To transform all characters, set it to "auto". (character, default to "auto")

verbose

Should the algorithm talk? (Logical, default to TRUE)

min_frequency

The minimal share of lines that a category should represent (numeric, between 0 and 1, default to 0)

...

Other arguments such as name_separator to separate words in new columns names (character, default to ".")

Value

A list where each element name is a column name of data set and each element new_cols and values the new columns that will be built during encoding.

Details

To avoid creating really large sparce matrices, one can use param min_frequency to be sure that only most representative values will be used to create a new column (and not outlayers or mistakes in data). Setting min_frequency to something gretter than 0 may cause the function to be slower (especially for large dataSet).

Examples

Run this code

# NOT RUN {
# Get a data set
data(adult)
encoding <- build_encoding(adult, cols = "auto", verbose = TRUE)

print(encoding)

# To limit the number of generated columns, one can use min_frequency parameter:
build_encoding(adult, cols = "auto", verbose = TRUE, min_frequency = 0.1)
# Set to 0.1, it will create columns only for values that are present 10% of the time.
# }

Run the code above in your browser using DataLab