XGBModel: Extreme Gradient Boosting Models

Description

Fits models with an efficient implementation of the gradient boosting framework from Chen & Guestrin.

Usage

XGBModel(
  nrounds = 100,
  ...,
  objective = character(),
  aft_loss_distribution = "normal",
  aft_loss_distribution_scale = 1,
  base_score = 0.5,
  verbose = 0,
  print_every_n = 1
)
XGBDARTModel(
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  min_child_weight = 1,
  max_delta_step = .(0.7 * is(y, "PoissonVariate")),
  subsample = 1,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  colsample_bynode = 1,
  alpha = 0,
  lambda = 1,
  tree_method = "auto",
  sketch_eps = 0.03,
  scale_pos_weight = 1,
  refresh_leaf = 1,
  process_type = "default",
  grow_policy = "depthwise",
  max_leaves = 0,
  max_bin = 256,
  num_parallel_tree = 1,
  sample_type = "uniform",
  normalize_type = "tree",
  rate_drop = 0,
  one_drop = 0,
  skip_drop = 0,
  ...
)
XGBLinearModel(
  alpha = 0,
  lambda = 0,
  updater = "shotgun",
  feature_selector = "cyclic",
  top_k = 0,
  ...
)
XGBTreeModel(
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  min_child_weight = 1,
  max_delta_step = .(0.7 * is(y, "PoissonVariate")),
  subsample = 1,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  colsample_bynode = 1,
  alpha = 0,
  lambda = 1,
  tree_method = "auto",
  sketch_eps = 0.03,
  scale_pos_weight = 1,
  refresh_leaf = 1,
  process_type = "default",
  grow_policy = "depthwise",
  max_leaves = 0,
  max_bin = 256,
  num_parallel_tree = 1,
  ...
)

Value

MLModel class object.

Arguments

nrounds

number of boosting iterations.

...

model parameters as described below and in the XGBoost documentation and arguments passed to XGBModel from the other constructors.

objective

optional character string defining the learning task and objective. Set automatically if not specified according to the following values available for supported response variable types.

factor:: "multi:softprob", "binary:logistic" (2 levels only)

numeric:

"reg:squarederror", "reg:logistic", "reg:gamma", "reg:tweedie", "rank:pairwise", "rank:ndcg", "rank:map"

PoissonVariate:

"count:poisson"

Surv:

"survival:aft", "survival:cox"

The first values listed are the defaults for the corresponding response types.

aft_loss_distribution

character string specifying a distribution for the accelerated failure time objective ("survival:aft") as "extreme", "logistic", or "normal".

aft_loss_distribution_scale

numeric scaling parameter for the accelerated failure time distribution.

base_score

initial prediction score of all observations, global bias.

verbose

numeric value controlling the amount of output printed during model fitting, such that 0 = none, 1 = performance information, and 2 = additional information.

print_every_n

numeric value designating the fitting iterations at at which to print output when verbose > 0.

eta

shrinkage of variable weights at each iteration to prevent overfitting.

gamma

minimum loss reduction required to split a tree node.

max_depth

maximum tree depth.

min_child_weight

minimum sum of observation weights required of nodes.

max_delta_step, tree_method, sketch_eps, scale_pos_weight, updater, refresh_leaf, process_type, grow_policy, max_leaves, max_bin, num_parallel_tree

other tree booster parameters.

subsample

subsample ratio of the training observations.

colsample_bytree, colsample_bylevel, colsample_bynode

subsample ratio of variables for each tree, level, or split.

alpha, lambda

L1 and L2 regularization terms for variable weights.

sample_type, normalize_type

type of sampling and normalization algorithms.

rate_drop

rate at which to drop trees during the dropout procedure.

one_drop

integer indicating whether to drop at least one tree during the dropout procedure.

skip_drop

probability of skipping the dropout procedure during a boosting iteration.

feature_selector, top_k

character string specifying the feature selection and ordering method, and number of top variables to select in the "greedy" and "thrifty" feature selectors.

Details

Response types:

factor, numeric, PoissonVariate, Surv

Automatic tuning of grid parameters:

XGBModel: NULL
XGBDARTModel: nrounds, eta*, gamma*, max_depth, min_child_weight*, subsample*, colsample_bytree*, rate_drop*, skip_drop*
XGBLinearModel: nrounds, alpha, lambda
XGBTreeModel: nrounds, eta*, gamma*, max_depth, min_child_weight*, subsample*, colsample_bytree*

* excluded from grids by default

The booster-specific constructor functions XGBDARTModel, XGBLinearModel, and XGBTreeModel are special cases of XGBModel which automatically set the XGBoost booster parameter. These are called directly in typical usage unless XGBModel is needed to specify a more general model.

Default argument values and further model details can be found in the source See Also link below.

In calls to varimp for XGBTreeModel, argument type may be specified as "Gain" (default) for the fractional contribution of each predictor to the total gain of its splits, as "Cover" for the number of observations related to each predictor, or as "Frequency" for the percentage of times each predictor is used in the trees. Variable importance is automatically scaled to range from 0 to 100. To obtain unscaled importance values, set scale = FALSE. See example below.

Examples

Run this code

# \donttest{
## Requires prior installation of suggested package xgboost to run

model_fit <- fit(Species ~ ., data = iris, model = XGBTreeModel)
varimp(model_fit, method = "model", type = "Frequency", scale = FALSE)
# }

Run the code above in your browser using DataLab