Trains a Random Forest model using H2O (http://www.h2o.ai)
s.H2ORF(x, y = NULL, x.test = NULL, y.test = NULL, x.valid = NULL,
y.valid = NULL, x.name = NULL, y.name = NULL, ip = "localhost",
port = 54321, n.trees = 500, max.depth = 20,
n.stopping.rounds = 50, mtry = -1, nfolds = 0, weights = NULL,
weights.test = NULL, balance.classes = TRUE, upsample = FALSE,
na.action = na.fail, n.cores = rtCores, print.plot = TRUE,
plot.fitted = NULL, plot.predicted = NULL,
plot.theme = getOption("rt.fit.theme", "lightgrid"), question = NULL,
verbose = TRUE, trace = 0, save.mod = FALSE, outdir = NULL, ...)
Training set features
Training set outcome
Testing set features (Used to evaluate model performance)
Testing set outcome
Validation set features (Used to build model / tune hyperparameters)
Validation set outcome
Character: Name for feature set
Character: Name for outcome
String: IP address of H2O server. Default = "localhost"
Integer: Port to connect to at ip
Integer: Number of trees to grow
Numeric vector: Weights for cases. For classification, weights
takes precedence
over ipw
, therefore set weights = NULL
if using ipw
.
Note: If weight
are provided, ipw
is not used. Leave NULL if setting ipw = TRUE
. Default = NULL
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
How to handle missing values. See ?na.fail
Integer: Number of cores to use
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
String: "zero", "dark", "box", "darkbox"
String: the question you are attempting to answer with this model, in plain language.
Logical: If TRUE, print summary to screen.
Integer: If higher than 0, will print more information to the console. Default = 0
Logical. If TRUE, save all output as RDS file in outdir
save.mod
is TRUE by default if an outdir
is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if save.mod
is TRUE
Additional parameters to pass to h2o::h2o.randomForest
Numeric: How many times to iterate through the dataset. Default = 10
rtMod object
elevate for external cross-validation
Other Supervised Learning: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.BAYESGLM
, s.BRUTO
,
s.C50
, s.CART
,
s.CTREE
, s.DA
,
s.ET
, s.EVTREE
,
s.GAM.default
, s.GAM.formula
,
s.GAMSEL
, s.GAM
,
s.GBM3
, s.GBM
,
s.GLMNET
, s.GLM
,
s.GLS
, s.H2ODL
,
s.H2OGBM
, s.IRF
,
s.KNN
, s.LDA
,
s.LM
, s.MARS
,
s.MLRF
, s.MXN
,
s.NBAYES
, s.NLA
,
s.NLS
, s.NW
,
s.POLYMARS
, s.PPR
,
s.PPTREE
, s.QDA
,
s.QRNN
, s.RANGER
,
s.RFSRC
, s.RF
,
s.SGD
, s.SPLS
,
s.SVM
, s.TFN
,
s.XGBLIN
, s.XGB
Other Tree-based methods: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.C50
, s.CART
,
s.CTREE
, s.ET
,
s.EVTREE
, s.GBM3
,
s.GBM
, s.H2OGBM
,
s.IRF
, s.MLRF
,
s.PPTREE
, s.RANGER
,
s.RFSRC
, s.RF
,
s.XGB