Tune hyperparameters using grid search and resampling, train a final model, and validate it
s.XGB(x, y = NULL, x.test = NULL, y.test = NULL, x.name = NULL,
y.name = NULL, booster = c("gbtree", "gblinear", "dart"),
silent = 1, missing = NA, nrounds = 500L, force.nrounds = NULL,
weights = NULL, ipw = TRUE, ipw.type = 2, upsample = FALSE,
upsample.seed = NULL, obj = NULL, feval = NULL, maximize = NULL,
xgb.verbose = NULL, print_every_n = 100L,
early.stopping.rounds = 50L, eta = 0.1, gamma = 0, max.depth = 3,
min.child.weight = 5, max.delta.step = 0, subsample = 0.75,
colsample.bytree = NULL, colsample.bylevel = 1, lambda = NULL,
lambda.bias = 0, alpha = 0, tree.method = "auto",
sketch.eps = 0.03, num.parallel.tree = 1, base.score = NULL,
objective = NULL, sample.type = "uniform",
normalize.type = "forest", rate.drop = 0.1, skip.drop = 0.5,
resampler = "strat.sub", n.resamples = 10, train.p = 0.75,
strat.n.bins = 4, stratify.var = NULL, target.length = NULL,
seed = NULL, error.curve = FALSE, plot.res = TRUE,
save.res = FALSE, save.res.mod = FALSE, importance = FALSE,
print.plot = TRUE, plot.fitted = NULL, plot.predicted = NULL,
plot.theme = getOption("rt.fit.theme", "lightgrid"), question = NULL,
rtclass = NULL, save.dump = FALSE, verbose = TRUE, n.cores = 1,
nthread = NULL, parallel.type = c("psock", "fork"), outdir = NULL,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE))
Numeric vector or matrix / data frame of features i.e. independent variables
Numeric vector of outcome, i.e. dependent variable
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in x
Numeric vector of testing set outcome
Character: Name for feature set
Character: Name for outcome
String: Booster to use. Options: "gbtree", "gblinear"
0: print XGBoost messages; 1: print no XGBoost messages
String or Numeric: Which values to consider as missing. Default = NA
Integer: Maximum number of rounds to run. Can be set to a high number as early stopping will limit nrounds by monitoring inner CV error
Integer: Number of rounds to run if not estimating optimal number by CV
Numeric vector: Weights for cases. For classification, weights
takes precedence
over ipw
, therefore set weights = NULL
if using ipw
.
Note: If weight
are provided, ipw
is not used. Leave NULL if setting ipw = TRUE
. Default = NULL
Logical: If TRUE, apply inverse probability weighting (for Classification only).
Note: If weights
are provided, ipw
is not used. Default = TRUE
Integer 0, 1, 2 1: class.weights as in 0, divided by max(class.weights) 2: class.weights as in 0, divided by min(class.weights) Default = 2
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Caution: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
Function: Custom objective function. See ?xgboost::xgboost
Function: Custom evaluation function. See ?xgboost::xgboost
Integer: Verbose level for XGB learners used for tuning.
Integer: Print evaluation metrics every this many iterations
Integer: Training on resamples of x.train
(tuning) will stop if performance
does not improve for this many rounds
[gS] Float (0, 1): Learning rate. Default = .1
[gS] Float: Minimum loss reduction required to make further partition
[gS] Integer: Maximum tree depth. (Default = 6)
[gS] Float:
[gS]
[gS]
[gS] L2 regularization on weights
[gS] L1 regularization on weights
[gS] XGBoost tree construction algorithm (Default = "auto")
[gS] Float (0, 1):
Integer: N of trees to grow in parallel: Results in Random Forest -like algorithm. (Default = 1; i.e. regular boosting)
Float: The mean outcome response (no need to set)
(Default = NULL)
(Default = "uniform")
(Default = "forest")
Logical: if TRUE, produce plot using mplot3
Takes precedence over plot.fitted
and plot.predicted
Logical: if TRUE, plot True (y) vs Fitted
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires x.test
and y.test
String: "zero", "dark", "box", "darkbox"
String: the question you are attempting to answer with this model, in plain language.
String: Class type to use. "S3", "S4", "RC", "R6"
Logical: If TRUE, print summary to screen.
Integer: Number of threads for xgboost using OpenMP. Only parallelize resamples
using n.cores
or the xgboost execution using this setting. At the moment of writing, parallelization via this
parameter causes a linear booster to fail most of the times. Therefore, default is rtCores
for 'gbtree', 1 for 'gblinear'
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if save.mod
is TRUE
Logical. If TRUE, save all output as RDS file in outdir
save.mod
is TRUE by default if an outdir
is defined. If set to TRUE, and no outdir
is defined, outdir defaults to paste0("./s.", mod.name)
[gS] for *linear* booster: L2 regularization on bias
rtMod object
[gS]: indicates parameter will be autotuned by grid search if multiple values are passed.
(s.XGB does its own grid search, similar to gridSearchLearn, may switch to gridSearchLearn similar to s.GBM)
Learn more about XGboost's parameters here: http://xgboost.readthedocs.io/en/latest/parameter.html
Case weights and therefore IPW do not seem to work, despite following documentation.
See how ipw = T fails and upsample = T works in imbalanced dataset.
11.24.16: Updated to work with latest development version of XGBoost from github, which changed some of
xgboost
's return values and is therefore not compatible with older versions
s.XGBLIN is a wrapper for s.XGB
with booster = "gblinear"
elevate for external cross-validation
Other Supervised Learning: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.BAYESGLM
, s.BRUTO
,
s.C50
, s.CART
,
s.CTREE
, s.DA
,
s.ET
, s.EVTREE
,
s.GAM.default
, s.GAM.formula
,
s.GAMSEL
, s.GAM
,
s.GBM3
, s.GBM
,
s.GLMNET
, s.GLM
,
s.GLS
, s.H2ODL
,
s.H2OGBM
, s.H2ORF
,
s.IRF
, s.KNN
,
s.LDA
, s.LM
,
s.MARS
, s.MLRF
,
s.MXN
, s.NBAYES
,
s.NLA
, s.NLS
,
s.NW
, s.POLYMARS
,
s.PPR
, s.PPTREE
,
s.QDA
, s.QRNN
,
s.RANGER
, s.RFSRC
,
s.RF
, s.SGD
,
s.SPLS
, s.SVM
,
s.TFN
, s.XGBLIN
Other Tree-based methods: s.ADABOOST
,
s.ADDTREE
, s.BART
,
s.C50
, s.CART
,
s.CTREE
, s.ET
,
s.EVTREE
, s.GBM3
,
s.GBM
, s.H2OGBM
,
s.H2ORF
, s.IRF
,
s.MLRF
, s.PPTREE
,
s.RANGER
, s.RFSRC
,
s.RF