XGBModel: Extreme Gradient Boosting Models

Description

Fits models within an efficient implementation of the gradient boosting framework from Chen & Guestrin.

Usage

XGBModel(params = list(), nrounds = 1, verbose = 0, print_every_n = 1)
XGBDARTModel(
  objective = NULL,
  aft_loss_distribution = "normal",
  aft_loss_distribution_scale = 1,
  base_score = 0.5,
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  min_child_weight = 1,
  max_delta_step = .(0.7 * is(y, "PoissonVariate")),
  subsample = 1,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  colsample_bynode = 1,
  lambda = 1,
  alpha = 0,
  tree_method = "auto",
  sketch_eps = 0.03,
  scale_pos_weight = 1,
  refresh_leaf = 1,
  process_type = "default",
  grow_policy = "depthwise",
  max_leaves = 0,
  max_bin = 256,
  num_parallel_tree = 1,
  sample_type = "uniform",
  normalize_type = "tree",
  rate_drop = 0,
  one_drop = 0,
  skip_drop = 0,
  ...
)
XGBLinearModel(
  objective = NULL,
  aft_loss_distribution = "normal",
  aft_loss_distribution_scale = 1,
  base_score = 0.5,
  lambda = 0,
  alpha = 0,
  updater = "shotgun",
  feature_selector = "cyclic",
  top_k = 0,
  ...
)
XGBTreeModel(
  objective = NULL,
  aft_loss_distribution = "normal",
  aft_loss_distribution_scale = 1,
  base_score = 0.5,
  eta = 0.3,
  gamma = 0,
  max_depth = 6,
  min_child_weight = 1,
  max_delta_step = .(0.7 * is(y, "PoissonVariate")),
  subsample = 1,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  colsample_bynode = 1,
  lambda = 1,
  alpha = 0,
  tree_method = "auto",
  sketch_eps = 0.03,
  scale_pos_weight = 1,
  refresh_leaf = 1,
  process_type = "default",
  grow_policy = "depthwise",
  max_leaves = 0,
  max_bin = 256,
  num_parallel_tree = 1,
  ...
)

Arguments

params

list of model parameters as described in the XGBoost documentation.

nrounds

maximum number of boosting iterations.

verbose

numeric value controlling the amount of output printed during model fitting, such that 0 = none, 1 = performance information, and 2 = additional information.

print_every_n

numeric value designating the fitting iterations at at which to print output when verbose > 0.

objective

character string specifying the learning task and objective. Possible values for supported response variable types are as follows.

factor:: "multi:softprob", "binary:logistic" (2 levels only)
numeric:: "reg:squarederror", "reg:logistic", "reg:gamma", "reg:tweedie", "rank:pairwise", "rank:ndcg", "rank:map"
PoissonVariate:: "count:poisson"
Surv:: "survival:cox", "survival:aft"

The first values listed are the defaults for the corresponding response types.

aft_loss_distribution

character string specifying the distribution for the accelerated failure time objective ("survival:aft") as "normal", "logistic", or "extreme".

aft_loss_distribution_scale

numeric scaling parameter for the accelerated failure time distribution.

base_score

initial numeric prediction score of all instances, global bias.

eta, gamma, max_depth, min_child_weight, max_delta_step, subsample, colsample_bytree, colsample_bylevel, colsample_bynode, lambda, alpha, tree_method, sketch_eps, scale_pos_weight, refresh_leaf, process_type, grow_policy, max_leaves, max_bin, num_parallel

see params reference.

...

arguments passed to XGBModel.

Value

MLModel class object.

Details

Response Types:

factor, numeric, PoissonVariate, Surv

Automatic Tuning of Grid Parameters

XGBDARTModel: nrounds, max_depth, eta, gamma*, min_child_weight*, subsample, colsample_bytree, rate_drop, skip_drop
XGBLinearModel: nrounds, lambda, alpha
XGBTreeModel: nrounds, max_depth, eta, gamma*, min_child_weight*, subsample, colsample_bytree

* included only in randomly sampled grid points

Default values for the NULL arguments and further model details can be found in the source link below.

In calls to varimp for XGBTreeModel, argument metric may be specified as "Gain" (default) for the fractional contribution of each predictor to the total gain of its splits, as "Cover" for the number of observations related to each predictor, or as "Frequency" for the percentage of times each predictor is used in the trees. Variable importance is automatically scaled to range from 0 to 100. To obtain unscaled importance values, set scale = FALSE. See example below.

Examples

Run this code

# NOT RUN {
## Requires prior installation of suggested package xgboost to run

model_fit <- fit(Species ~ ., data = iris, model = XGBTreeModel)
varimp(model_fit, metric = "Frequency", scale = FALSE)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab