Learn R Programming

modellingTools (version 0.1.0)

simple_bin: Discretize variables in your training and test datasets

Description

Function to apply simple equal-width or equal-height binning to columns of a training dataset, and then optionally bin the columns of a test set into bins with the appropriate cutpoints

Usage

simple_bin(train, test = NULL, exclude_vars = NULL, include_vars = NULL, bins, type = "height", na_include = TRUE)

Arguments

train
training set
test
test set
exclude_vars
variables to exclude (e.g. the target, or the row ID)
include_vars
if you only want certain variables binned, you may specify them directly instead of excluding all other variables
bins
single number specifying the number of bins to create on each variable, or a named list specifying cut-points for each variable
type
if bins is given as a number, then this determines whether to create bins with equal number of observations ("height") or of equal width ("width")
na_include
logical. Give missing values their own bin?

Value

if test is not NULL, a list containing two tbl_df objects, with appropriate columns replaced by their binned values and all other columns unchanged if test is NULL, returns the training set portion of the list

Details

This function was built as a convenience, to automate the process of binning continuous variables into disrete levels, and also to provide a simple, interpretible, unambiguous method of dealing with missing values in data science problems.

See Also

vector_bin, get_vector_cutpoints

Other discretization: binned_data_cutpoints, get_vector_cutpoints, vector_bin