Learn R Programming

rtemis (version 0.79)

s.GBM3: Gradient Boosting Machine [C, R, S]

Description

Train a GBM model using gbm-developers/gbm3

Usage

s.GBM3(x, y = NULL, x.test = NULL, y.test = NULL, weights = NULL,
  ipw = TRUE, ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
  distribution = NULL, interaction.depth = 2, shrinkage = 0.01,
  bag.fraction = 0.9, mFeatures = NULL, n.minobsinnode = 5,
  n.trees = 2000, max.trees = 5000, force.n.trees = NULL,
  n.tree.window = 0, gbm.select.smooth = TRUE, smoother = c("loess",
  "supsmu"), n.new.trees = 500, min.trees = 50,
  failsafe.trees = 1000, imetrics = FALSE, .gs = FALSE,
  grid.resample.rtset = rtset.resample("kfold", 5),
  grid.search.type = c("exhaustive", "randomized"),
  grid.randomized.p = 0.1, metric = NULL, maximize = NULL,
  plot.tune.error = FALSE, exclude.test.lt.train = FALSE,
  exclude.lt.min.trees = FALSE, res.fail.thres = 0.99,
  n.extra.trees = 0, n.cores = rtCores, gbm.cores = 1,
  relInf = TRUE, varImp = FALSE, offset = NULL,
  var.monotone = NULL, keep.data = TRUE, var.names = NULL,
  response.name = "y", group = NULL, plot.perf = FALSE,
  plot.res = ifelse(!is.null(outdir), TRUE, FALSE), plot.fitted = NULL,
  plot.predicted = NULL, plotRelInf = FALSE, plotVarImp = FALSE,
  print.plot = TRUE, plot.theme = getOption("rt.fit.theme",
  "lightgrid"), x.name = NULL, y.name = NULL, question = NULL,
  verbose = TRUE, trace = 0, grid.verbose = TRUE,
  gbm.fit.verbose = FALSE, outdir = NULL, save.gridrun = FALSE,
  save.error.diagnostics = FALSE, save.rds = TRUE, save.res = FALSE,
  save.res.mod = FALSE, save.mod = ifelse(!is.null(outdir), TRUE,
  FALSE), ...)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ipw, therefore set weights = NULL if using ipw. Note: If weight are provided, ipw is not used. Leave NULL if setting ipw = TRUE. Default = NULL

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

ipw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

interaction.depth

[gS] Integer: Interaction depth

shrinkage

[gS] Float: Shrinkage (learning rate)

bag.fraction

[gS] Float (0, 1): Fraction of cases to use to train each tree. Helps avoid overfitting. Default = .75

mFeatures

[gS] Integer: Number of features to randomly choose from all available features to train at each step. Default = NULL which results in using all features.

n.minobsinnode

[gS] Integer: Minimum number of observation allowed in node

n.trees

Integer: Initial number of trees to fit

grid.resample.rtset

List: Output of rtset.resample defining gridSearchLearn parameters. Default = rtset.resample("kfold", 5)

grid.search.type

String: Type of grid search to perform: "exhaustive" or "randomized". Default = "exhaustive"

grid.randomized.p

Float (0, 1): If grid.search.type = "randomized", randomly run this proportion of combinations. Default = .1

metric

String: Metric to minimize, or maximize if maximize = TRUE during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.

maximize

Logical: If TRUE, metric will be maximized if grid search is run. Default = FALSE

n.cores

Integer: Number of cores to use. Defaults to available cores reported by future::availableCores(), unles option rt.cores is set at the time the library is loaded

relInf

Logical: If TRUE (Default), estimate variables' relative influence.

varImp

Logical: If TRUE, estimate variable importance by permutation (as in random forests; noted as experimental in gbm). Takes longer than (default) relative influence. The two measures are highly correlated.

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.theme

String: "zero", "dark", "box", "darkbox"

x.name

Character: Name for feature set

y.name

Character: Name for outcome

question

String: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

trace

Integer: If higher than 0, will print more information to the console. Default = 0

grid.verbose

Logical: Passed to gridSearchLearn

outdir

String: If defined, save log, 'plot.all' plots (see above) and RDS file of complete output

save.rds

Logical: If outdir is defined, should all data be saved in RDS file? s.SVDnetGBM will save mod.gbm, so no need to save again.

save.res.mod

Logical: If TRUE, save gbm model for each grid run. For diagnostic purposes only: Object size adds up quickly

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

...

Additional arguments

stratify.var

If resampling is stratified, stratify against this variable. Defaults to outcome

Details

Early stopping is implemented by fitting n.trees initially, checking the (smoothed) validation error curve, and adding n.new.trees if needed, until error does not reduce or max.trees is reached. [gS] in the argument description indicates that multiple values can be passed, in which case tuning will be performed using grid search. gS is supported for: interaction.depth, shrinkage, bag.fraction, mFeatures, and n.minobsinnode This function includes a workaround for when gbm.fit fails. If an error is detected, gbm.fit is rerun until successful and the procedure continues normally

See Also

elevate for external cross-validation

Other Supervised Learning: s.ADABOOST, s.ADDTREE, s.BART, s.BAYESGLM, s.BRUTO, s.C50, s.CART, s.CTREE, s.DA, s.ET, s.EVTREE, s.GAM.default, s.GAM.formula, s.GAMSEL, s.GAM, s.GBM, s.GLMNET, s.GLM, s.GLS, s.H2ODL, s.H2OGBM, s.H2ORF, s.IRF, s.KNN, s.LDA, s.LM, s.MARS, s.MLRF, s.MXN, s.NBAYES, s.NLA, s.NLS, s.NW, s.POLYMARS, s.PPR, s.PPTREE, s.QDA, s.QRNN, s.RANGER, s.RFSRC, s.RF, s.SGD, s.SPLS, s.SVM, s.TFN, s.XGBLIN, s.XGB

Other Tree-based methods: s.ADABOOST, s.ADDTREE, s.BART, s.C50, s.CART, s.CTREE, s.ET, s.EVTREE, s.GBM, s.H2OGBM, s.H2ORF, s.IRF, s.MLRF, s.PPTREE, s.RANGER, s.RFSRC, s.RF, s.XGB

Other Ensembles: s.ADABOOST, s.GBM, s.RANGER, s.RF