Learn R Programming

dlookr (version 0.5.0)

binning_by: Optimal Binning for Scoring Modeling

Description

The binning_by() finding intervals for numerical variable using optical binning. Optimal binning categorizes a numeric characteristic into bins for ulterior usage in scoring modeling.

Usage

binning_by(.data, y, x, p = 0.05, ordered = TRUE, labels = NULL)

Arguments

.data

a data frame.

y

character. name of binary response variable(0, 1). The variable must contain only the integers 0 and 1 as element. However, in the case of factor having two levels, it is performed while type conversion is performed in the calculation process.

x

character. name of continuous characteristic variable. At least 5 different values. and Inf is not allowed.

p

numeric. percentage of records per bin. Default 5% (0.05). This parameter only accepts values greater that 0.00 (0%) and lower than 0.50 (50%).

ordered

logical. whether to build an ordered factor or not.

labels

character. the label names to use for each of the bins.

Value

an object of "optimal_bins" class. Attributes of "optimal_bins" class is as follows.

  • class : "optimal_bins".

  • type : binning type, "optimal".

  • breaks : numeric. the number of intervals into which x is to be cut.

  • levels : character. levels of binned value.

  • raw : numeric. raw data, x argument value.

  • ivtable : data.frame. information value table.

  • iv : numeric. information value.

  • target : integer. binary response variable.

attributes of "optimal_bins" class

Attributes of the "optimal_bins" class that is as follows.

  • class : "optimal_bins".

  • levels : character. factor or ordered factor levels

  • type : character. binning method

  • breaks : numeric. breaks for binning

  • raw : numeric. before the binned the raw data

  • ivtable : data.frame. information value table

  • iv : numeric. information value

  • target : integer. binary response variable

See vignette("transformation") for an introduction to these concepts.

Details

This function is useful when used with the mutate/transmute function of the dplyr package. And this function is implemented using smbinning() function of smbinning package.

See Also

binning, plot.optimal_bins.

Examples

Run this code
# NOT RUN {
library(dplyr)

# Generate data for the example
heartfailure2 <- heartfailure
heartfailure2[sample(seq(NROW(heartfailure2)), 5), "creatinine"] <- NA

# optimal binning using character
bin <- binning_by(heartfailure2, "death_event", "creatinine")

# optimal binning using name
bin <- binning_by(heartfailure2, death_event, creatinine)
bin

# performance table
attr(bin, "performance")

# summary optimal_bins class
summary(bin)

# visualize all information for optimal_bins class
# plot(bin)

# visualize WoE information for optimal_bins class
# plot(bin, type = "WoE")

# visualize all information without typographic
# plot(bin, typographic = FALSE)

# extract binned results
# extract(bin) %>% 
#   head(20)

# }

Run the code above in your browser using DataLab