Supports the Extreme Gradient Boosting package for SuperLearnering, which is a variant of gradient boosted machines (GBM).
SL.xgboost(Y, X, newX, family, obsWeights, id, ntrees = 1000, max_depth = 4,
shrinkage = 0.1, minobspernode = 10, params = list(), nthread = 1,
verbose = 0, save_period = NULL, ...)
Outcome variable
Covariate dataframe
Optional dataframe to predict the outcome
"gaussian" for regression, "binomial" for binary classification, "multinomial" for multiple classification (not yet supported).
Optional observation-level weights (supported but not tested)
Optional id to group observations from the same unit (not used currently).
How many trees to fit. Low numbers may underfit but high numbers may overfit, depending also on the shrinkage.
How deep each tree can be. 1 means no interactions, aka tree stubs.
How much to shrink the predictions, in order to reduce overfitting.
Minimum observations allowed per tree node, after which no more splitting will occur.
Many other parameters can be customized. See http://xgboost.readthedocs.io/en/latest/parameter.html
How many threads (cores) should xgboost use. Generally we want to keep this to 1 so that XGBoost does not compete with SuperLearner parallelization.
Verbosity of XGB fitting.
How often (in tree iterations) to save current model to disk during processing. If NULL does not save model, and if 0 saves model at the end.
Any remaining arguments (not supported though).
The performance of XGBoost, like GBM, is sensitive to the configuration settings. Therefore it is best to create multiple configurations using create.SL.xgboost and allow the SuperLearner to choose the best weights based on cross-validated performance.
If you run into errors please first try installing the latest version of XGBoost from drat as described here: http://xgboost.readthedocs.io/en/latest/build.html