Train a GBM model using gbm::gbm.fit
s.GBM(x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL,
ipw = TRUE, ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
distribution = NULL, interaction.depth = 2, shrinkage = 0.01,
bag.fraction = 0.9, n.minobsinnode = 5, n.trees = 2000,
max.trees = 5000, force.n.trees = NULL, n.tree.window = 0,
gbm.select.smooth = TRUE, n.new.trees = 500, min.trees = 50,
failsafe.trees = 1000, imetrics = FALSE, .gs = FALSE,
grid.resample.rtset = rtset.resample("kfold", 5),
grid.search.type = "exhaustive", metric = NULL, maximize = NULL,
plot.tune.error = FALSE, exclude.test.lt.train = FALSE,
exclude.lt.min.trees = FALSE, res.fail.thres = 0.99,
n.extra.trees = 0, n.cores = rtCores, relInf = TRUE,
varImp = FALSE, offset = NULL, misc = NULL, var.monotone = NULL,
keep.data = TRUE, var.names = NULL, response.name = "y",
group = NULL, plot.perf = FALSE,
plot.res = ifelse(!is.null(outdir), TRUE, FALSE), plot.fitted = NULL,
plot.predicted = NULL, plotRelInf = FALSE, plotVarImp = FALSE,
print.plot = TRUE, plot.theme = getOption("rt.fit.theme",
"lightgrid"), x.name = NULL, y.name = NULL, question = NULL,
verbose = TRUE, trace = 0, grid.verbose = TRUE,
gbm.fit.verbose = FALSE, outdir = NULL, save.gridrun = FALSE,
save.rds = TRUE, save.res = FALSE, save.res.mod = FALSE,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE), ...)
Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in x
Numeric vector of testing set outcome
Numeric vector: Weights for cases. For classification, weights
takes precedence
over ipw
, therefore set weights = NULL
if using ipw
.
Note: If weight
are provided, ipw
is not used. Leave NULL if setting ipw = TRUE
. Default = NULL
Logical: If TRUE, apply inverse probability weighting (for Classification only).
Note: If weights
are provided, ipw
is not used. Default = TRUE
Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
[gS] Integer: Interaction depth
[gS] Float: Shrinkage (learning rate)
[gS] Float (0, 1): Fraction of cases to use to train each tree. Helps avoid overfitting. Default = .75
[gS] Integer: Minimum number of observation allowed in node
Integer: Initial number of trees to fit
Logical: If TRUE (Default), estimate variables' relative influence.
Logical: If TRUE, estimate variable importance by permutation (as in random forests; noted as experimental in gbm). Takes longer than (default) relative influence. The two measures are highly correlated.
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
String: "zero", "dark", "box", "darkbox"
Character: Name for feature set
Character: Name for outcome
String: the question you are attempting to answer with this model, in plain language.
Logical: If TRUE, print summary to screen.
Integer: If higher than 0, will print more information to the console. Default = 0
String: If defined, save log, 'plot.all' plots (see above) and RDS file of complete output
Logical: If outdir is defined, should all data be saved in RDS file? s.SVDnetGBM will save mod.gbm, so no need to save again.
Logical: If TRUE, save gbm model for each grid run. For diagnostic purposes only: Object size adds up quickly
Logical. If TRUE, save all output as RDS file in outdir
save.mod
is TRUE by default if an outdir
is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
Additional arguments
If resampling is stratified, stratify against this variable. Defaults to outcome
This is the older gbm package available on CRAN. It may be preferable to use s.GBM3
which uses gbm-developers/gbm3
from GitHub.
Early stopping is implemented by fitting n.trees
initially, checking the (smoothed) validation
error curve, and adding n.new.trees
if needed, until error does not reduce or max.trees
is
reached.
[gS] in the argument description indicates that multiple values can be passed, in
which case tuning will be performed using grid search. gS is supported for:
interaction.depth, shrinkage, bag.fraction, and n.minobsinnode
This function includes a workaround for when gbm.fit
fails.
If an error is detected, gbm.fit
is rerun until successful and the procedure continues normally
elevate for external cross-validation
Other Supervised Learning: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.BAYESGLM
, s.BRUTO
,
s.C50
, s.CART
,
s.CTREE
, s.DA
,
s.ET
, s.EVTREE
,
s.GAM.default
, s.GAM.formula
,
s.GAMSEL
, s.GAM
,
s.GBM3
, s.GLMNET
,
s.GLM
, s.GLS
,
s.H2ODL
, s.H2OGBM
,
s.H2ORF
, s.IRF
,
s.KNN
, s.LDA
,
s.LM
, s.MARS
,
s.MLRF
, s.MXN
,
s.NBAYES
, s.NLA
,
s.NLS
, s.NW
,
s.POLYMARS
, s.PPR
,
s.PPTREE
, s.QDA
,
s.QRNN
, s.RANGER
,
s.RFSRC
, s.RF
,
s.SGD
, s.SPLS
,
s.SVM
, s.TFN
,
s.XGBLIN
, s.XGB
Other Tree-based methods: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.C50
, s.CART
,
s.CTREE
, s.ET
,
s.EVTREE
, s.GBM3
,
s.H2OGBM
, s.H2ORF
,
s.IRF
, s.MLRF
,
s.PPTREE
, s.RANGER
,
s.RFSRC
, s.RF
,
s.XGB
Other Ensembles: s.ADABOOST
,
s.GBM3
, s.RANGER
,
s.RF