Learn R Programming

rtemis (version 0.79)

s.H2OGBM: Gradient Boosting Machine on H2O [C, R]

Description

Trains a Gradient Boosting Machine using H2O (http://www.h2o.ai)

Usage

s.H2OGBM(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
  y.name = NULL, ip = "localhost", port = 54321, h2o.init = TRUE,
  gs.h2o.init = FALSE, h2o.shutdown.at.end = TRUE,
  grid.resample.rtset = rtset.resample("kfold", 5), metric = NULL,
  maximize = NULL, n.trees = 10000, force.n.trees = NULL,
  max.depth = 5, n.stopping.rounds = 50, stopping.metric = "AUTO",
  p.col.sample = 1, p.row.sample = 0.9, minobsinnode = 5,
  min.split.improvement = 1e-05, quantile.alpha = 0.5,
  learning.rate = 0.01, learning.rate.annealing = 1, weights = NULL,
  ipw = TRUE, ipw.type = 2, upsample = FALSE, upsample.seed = NULL,
  na.action = na.fail, grid.n.cores = 1, n.cores = rtCores,
  imetrics = FALSE, .gs = FALSE, print.plot = TRUE,
  plot.fitted = NULL, plot.predicted = NULL,
  plot.theme = getOption("rt.fit.theme", "lightgrid"), question = NULL,
  verbose = TRUE, trace = 0, grid.verbose = TRUE, save.mod = FALSE,
  outdir = NULL, ...)

Arguments

x

Numeric vector or matrix / data frame of features i.e. independent variables

y

Numeric vector of outcome, i.e. dependent variable

x.test

Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in x

y.test

Numeric vector of testing set outcome

x.name

Character: Name for feature set

y.name

Character: Name for outcome

ip

String: IP address of H2O server. Default = "localhost"

port

Integer: Port number for server. Default = 54321

n.trees

Integer: Number of trees to grow. Maximum number of trees if n.stopping.rounds > 0

max.depth

[gS] Integer: Depth of trees to grow

n.stopping.rounds

Integer: If > 0, stop training if stopping.metric does not improve for this many rounds

stopping.metric

String: "AUTO" (Default), "deviance", "logloss", "MSE", "RMSE", "MAE", "RMSLE", "AUC", "lift_top_group", "misclassification", "mean_per_class_error"

p.col.sample

[gS]

p.row.sample

[gS]

minobsinnode

[gS]

learning.rate

[gS]

learning.rate.annealing

[gS]

weights

Numeric vector: Weights for cases. For classification, weights takes precedence over ipw, therefore set weights = NULL if using ipw. Note: If weight are provided, ipw is not used. Leave NULL if setting ipw = TRUE. Default = NULL

ipw

Logical: If TRUE, apply inverse probability weighting (for Classification only). Note: If weights are provided, ipw is not used. Default = TRUE

ipw.type

Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2

upsample

Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness

upsample.seed

Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)

na.action

How to handle missing values. See ?na.fail

n.cores

Integer: Number of cores to use

.gs

Internal use only

print.plot

Logical: if TRUE, produce plot using mplot3 Takes precedence over plot.fitted and plot.predicted

plot.fitted

Logical: if TRUE, plot True (y) vs Fitted

plot.predicted

Logical: if TRUE, plot True (y.test) vs Predicted. Requires x.test and y.test

plot.theme

String: "zero", "dark", "box", "darkbox"

question

String: the question you are attempting to answer with this model, in plain language.

verbose

Logical: If TRUE, print summary to screen.

trace

Integer: If higher than 0, will print more information to the console. Default = 0

save.mod

Logical. If TRUE, save all output as RDS file in outdir save.mod is TRUE by default if an outdir is defined. If set to TRUE, and no outdir is defined, outdir defaults to paste0("./s.", mod.name)

outdir

Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if save.mod is TRUE

...

Additional arguments

Value

rtMod object

Details

[gS] denotes tunable hyperparameters Warning: If you get an HTTP 500 error at random, use h2o.shutdown() to shutdown the server. It will be restarted when s.H2OGBM is called

See Also

elevate for external cross-validation

Other Supervised Learning: s.ADABOOST, s.ADDTREE, s.BART, s.BAYESGLM, s.BRUTO, s.C50, s.CART, s.CTREE, s.DA, s.ET, s.EVTREE, s.GAM.default, s.GAM.formula, s.GAMSEL, s.GAM, s.GBM3, s.GBM, s.GLMNET, s.GLM, s.GLS, s.H2ODL, s.H2ORF, s.IRF, s.KNN, s.LDA, s.LM, s.MARS, s.MLRF, s.MXN, s.NBAYES, s.NLA, s.NLS, s.NW, s.POLYMARS, s.PPR, s.PPTREE, s.QDA, s.QRNN, s.RANGER, s.RFSRC, s.RF, s.SGD, s.SPLS, s.SVM, s.TFN, s.XGBLIN, s.XGB

Other Tree-based methods: s.ADABOOST, s.ADDTREE, s.BART, s.C50, s.CART, s.CTREE, s.ET, s.EVTREE, s.GBM3, s.GBM, s.H2ORF, s.IRF, s.MLRF, s.PPTREE, s.RANGER, s.RFSRC, s.RF, s.XGB