Learn R Programming

⚠️There's a newer version (1.3.1) of this package.Take me there.

automodel

automodel is a free and open source automated modeling R package designed to help model developers improve model development efficiency and enable many people with no background in data science to complete the modeling work in a short time.Let them focus more on the problem itself and allocate more time to decision-making.

automodel covers various tools such as data preprocessing, variable processing/derivation, variable screening/dimensionality reduction, modeling, data analysis, data visualization, model evaluation, strategy analysis, etc. It is a set of customized "core" tool kit for model developers.

automodel is suitable for machine learning automated modeling of classification targets, and is more suitable for the risk and marketing data of financial credit, e-commerce, and insurance with relatively high noise and low information content.

Installation

# install.packages("devtools")
devtools::install_github("FanHansen/automodel")

Example

require(automodel)
if (!dir.exists("c:/test_model")) dir.create("c:/test_model")
setwd("c:/test_model")
#set parameters
LR.params = lr_params(
bins_control = list(bins_num = 8,bins_pct = 0.05, b_chi = 0.02, 
b_odds = 0.1,b_psi = 0.02,b_gb = 0.15,mono = 0.3,gb_psi = 0.05,kc = 1),
score_card = TRUE, cor_p = 0.7, iv_i = 0.02, psi_i = 0.1 )
XGB.params = xgb_params(nrounds = 10000, 
params = list(max.depth = 4, eta = 0.01, min_child_weight = 50, subsample = 0.5, colsample_bytree = 0.6, gamma = 0, max_delta_step = 1, eval_metric = "auc", objective = "binary:logistic"), early_stopping_rounds = 300)
#training model
Lending_model = training_model(
dat_train = lendingclub,
model_name = "lendingclub", target = "loan_status", occur_time = "issue_d",
ex_cols = c("last_credit_pull_d", "next_pymnt_d", "prncp|recoveries|rec_|funded_amnt|pymnt|fee$"),
obs_id = "id", prop = 0.7,
feature_filter = list(filter = c("IV", "PSI", "COR", "XGB"), cv_folds = 1, iv_cp = 0.02,
psi_cp = 0.1, cor_cp = 0.8,xgb_cp = 0, hopper = TRUE), algorithm = list("LR", "XGB"),
LR.params = LR.params, XGB.params = XGB.params,
parallel = FALSE,
save_pmml = FALSE,
 plot_show = FALSE,
seed = 46)

Copy Link

Version

Install

install.packages('creditmodel')

Monthly Downloads

543

Version

1.0

License

AGPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Dongping Fan

Last Published

April 28th, 2019

Functions in creditmodel (1.0)

checking_data

Checking Data
cut_equal

Generating Initial Equal Size Sample Bins
cv_split

Stratified Folds
derived_ts_vars

Derivation of Behavioral Variables
as_percent

Percent Format
char_cor_vars

Cramer's V matrix between categorical variables.
digits_num

Number of digits
cleaning_data

Data Cleaning
address_varieble

address_varieble
cor_plot

Correlation Plot
analysis_nas

Missing Analysis
gbm_filter

Select Features using GBM
de_percent

Recovery Percent Format
get_breaks_all

Generates Best Breaks for Binning
get_correlation_group

get_correlation_group
derived_interval

derived_interval
get_plots

Plot Independent Variables
get_shadow_nas

get_shadow_nas
get_tree_breaks

Getting the breaks for terminal nodes from decision tree
analysis_outliers

Outliers Analysis
date_cut

Date Time Cut Point
get_psi_all

Calculate Population Stability Index (PSI) get_psi is used to calculate Population Stability Index (PSI) of an independent variable. get_psi_all can loop through PSI for all specified independent variables.
knn_nas_imp

Imputate nas using KNN
ks_table

ks_table & plot
PCA_reduce

PCA Dimension Reduction
city_varieble

city_varieble
UCICreditCard

UCI Credit Card data
gbm_params

GBM Parameters
city_varieble_process

Processing of Address Variables
cos_sim

cos_sim
customer_segmentation

Customer Segmentation
ks_value

ks_value
de_one_hot_encoding

Recovery One-Hot Encoding
null_blank_na

Encode NAs
lasso_filter

Selected Variables by LASSO
loop_function

Loop Function. #' loop_function is an iterator to loop through
one_hot_encoding

One-Hot Encoding
euclid_dist

euclid_dist
entry_rate_max

Max Percent of Unique Values
love_color

love_color
get_logistic_coef

get logistic coef
entry_rate_na

Max Percent of NAs
re_name

Rename
fast_high_cor_filter

high_cor_filter
feature_select_wrapper

Feature Selection Wrapper
get_ctree_rules

Parse party ctree rules
derived_partial_acf

derived_partial_acf
reduce_high_cor

Compare the two highly correlated variables
train_test_split

Train-Test-Split
get_iv_all

Calculate Information Value (IV) get_iv is used to calculate Information Value (IV) of an independent variable. get_iv_all can loop through IV for all specified independent variables.
get_median

get central value.
fuzzy_cluster_means

Fuzzy Cluster means.
training_model

Training model
get_x_list

Get X List.
outliers_detection

Outliers Detection outliers_detection is for outliers detecting using Kmeans and Local Outlier Factor (lof)
%alike%

Fuzzy String matching
plot_theme

plot_theme
derived_pct

derived_pct
get_best_lambda

get_best_lambda plot_theme is for get Best lambda required in lasso_filter. This function required in lasso_filter
psi_iv_filter

Variable reduction based on Information Value & Population Stability Index filter
remove_duplicated

Remove Duplicated Observations
quick_as_df

List as data.frame quickly
require_packages

Packages required and intallment
select_best_class

Generates Best Binning Breaks
get_psi_iv_all

Calculate IV & PSI
get_bins_table_all

Table of Binning
sim_str

sim_str
get_score_card

Score Card
process_nas

Missing Values Treatment
local_outlier_factor

local_outlier_factor local_outlier_factor is function for calculating the lof factor for a data set using knn This function is not intended to be used by end user.
lendingclub

Lending Club data
stop_parallel_computing

Stop parallel computing
get_names

Get Variable Names
get_nas_random

get_nas_random
low_variance_filter

Filtering Low Variance Variables
time_transfer

Time Format Transfering
%islike%

Fuzzy String matching
is_date

is_date
process_outliers

Outliers Treatment
time_varieble

time_varieble
lr_params

Logistic Regression & Scorecard Parameters
rf_params

Random Forest Parameters
time_vars_process

Processing of Time or Date Variables
xgb_params

Logistic Regression & Scorecard Parameters
rowAny

Functions for vector operation.
merge_category

Merge Category
min_max_norm

Min Max Normalization
split_bins

split_bins
save_dt

Save data
start_parallel_computing

Parallel computing and export variables to global Env.
variable_process

variable_process
vintage_function

vintage_function vintage_function is for vintage analysis.
xgb_filter

Select Features using XGB
woe_trans_all

Converting data to WOE
score_transfer

Scoring
char_to_num

character to number
add_variable_process

add_variable_process