Learn R Programming

creditmodel (version 1.0)

psi_iv_filter: Variable reduction based on Information Value & Population Stability Index filter

Description

psi_iv_filter is for selecting important and stable features using IV & PSI.

Usage

psi_iv_filter(dat, dat_test = NULL, target, x_list = NULL,
  breaks_list = NULL, pos_flag = NULL, ex_cols = NULL,
  occur_time = NULL, oot_pct = 0.7, psi_i = 0.1, iv_i = 0.01,
  vars_name = FALSE, note = FALSE, parallel = FALSE,
  save_data = FALSE, file_name = NULL, dir_path = tempdir(), ...)

Arguments

dat

A data.frame with independent variables and target variable.

dat_test

A data.frame of test data. Default is NULL.

target

The name of target variable.

x_list

Names of independent variables.

breaks_list

A table containing a list of splitting points for each independent variable. Default is NULL.

pos_flag

The value of positive class of target variable, default: "1".

ex_cols

A list of excluded variables. Regular expressions can also be used to match variable names. Default is NULL.

occur_time

The name of the variable that represents the time at which each observation takes place.

oot_pct

Percentage of observations retained for overtime test (especially to calculate PSI). Defualt is 0.7

psi_i

The maximum threshold of PSI. 0 <= psi_i <=1; 0.05 to 0.2 usually work. Default: 0.1

iv_i

The minimum threshold of IV. 0 < iv_i ; 0.01 to 0.1 usually work. Default: 0.01

vars_name

Logical, output a list of filtered variables or table with detailed IV and PSI value of each variable. Default is FALSE.

note

Logical, outputs info. Default is TRUE.

parallel

Logical, parallel computing. Default is FALSE.

save_data

Logical, save results in locally specified folder. Default is FALSE.

file_name

The name for periodically saved results files. Default is "Featrue_importance_IV_PSI".

dir_path

The path for periodically saved results files. Default is tempdir().

...

Other parameters.

Value

A list with the following elements:

  • Feature Selected variables.

  • IV IV of variables.

  • PSI PSI of variables.

See Also

xgb_filter, gbm_filter, feature_select_wrapper

Examples

Run this code
# NOT RUN {
psi_iv_filter(dat= UCICreditCard[1:1000,c(2,4,8:9,26)],
             target = "default.payment.next.month",
             occur_time = "apply_date",
             parallel = FALSE)
# }

Run the code above in your browser using DataLab