Learn R Programming

SmartEDA (version 0.3.10)

ExpCatStat: Function provides summary statistics for all character or categorical columns in the dataframe

Description

This function combines results from weight of evidence, information value and summary statistics.

Usage

ExpCatStat(
  data,
  Target = NULL,
  result = "Stat",
  clim = 10,
  nlim = 10,
  bins = 10,
  Pclass = NULL,
  plot = FALSE,
  top = 20,
  Round = 2
)

Value

This function provides summary statistics for categorical variable

  • Stat - Summary statistics includes Chi square test scores, p value, Information values, Cramers V and Degree if association

  • IV - Weight of evidence and Information values

Columns description:

  • Variable variable name

  • Target - Target variable

  • class - name of bin (variable value otherwise)

  • out0 - number of good observations

  • out1 - number of bad observations

  • Total - Total values for each category

  • pct1 - good observations / total good observations

  • pct0 - bad observations / total bad observations

  • odds - Odds ratio [(a/b)/(c/d)]

  • woe - Weight of Evidence – calculated as ln(odds)

  • iv - Information Value - ln(odds) * (pct0 – pct1)

Arguments

data

dataframe or matrix

Target

target variable

result

"Stat" - summary statistics, "IV" - information value

clim

maximum unique levles for categorical variable. Variables will be dropped if unique levels is higher than clim for class factor/character variable

nlim

maximum unique values for numeric variable.

bins

number of bins (default is 10)

Pclass

reference category of target variable

plot

Information value barplot (default FALSE)

top

for plotting top information values (default value is 20)

Round

round of value

Author

dubrangala

Details

Criteria used for categorical variable predictive power classification are

  • If information value is < 0.03 then predictive power = "Not Predictive"

  • If information value is 0.3 to 0.1 then predictive power = "Somewhat Predictive"

  • If information value is 0.1 to 0.3 then predictive power = "Meidum Predictive"

  • If information value is >0.3 then predictive power = "Highly Predictive"

Examples

Run this code
# Example 1
## Read mtcars data
# Target variable "am" - Transmission (0 = automatic, 1 = manual)
# Summary statistics
ExpCatStat(mtcars,Target="am",result = "Stat",clim=10,nlim=10,bins=10,
Pclass=1,plot=FALSE,top=20,Round=2)
# Information value plot
ExpCatStat(mtcars,Target="am",result = "Stat",clim=10,nlim=10,bins=10,
Pclass=1,plot=TRUE,top=20,Round=2)
# Information value for categorical Independent variables
ExpCatStat(mtcars,Target="am",result = "IV",clim=10,nlim=10,bins=10,
Pclass=1,plot=FALSE,top=20,Round=2)

Run the code above in your browser using DataLab