lgb.train: Main training logic for LightGBM

Description

Logic to train with LightGBM

Usage

lgb.train(
  params = list(),
  data,
  nrounds = 100L,
  valids = list(),
  obj = NULL,
  eval = NULL,
  verbose = 1L,
  record = TRUE,
  eval_freq = 1L,
  init_model = NULL,
  colnames = NULL,
  categorical_feature = NULL,
  early_stopping_rounds = NULL,
  callbacks = list(),
  reset_data = FALSE,
  ...
)

Value

a trained booster model lgb.Booster.

Arguments

params

a list of parameters. See the "Parameters" section of the documentation for a list of parameters and valid values.

data

a lgb.Dataset object, used for training. Some functions, such as lgb.cv, may allow you to pass other types of data like matrix and then separately supply label as a keyword argument.

nrounds

number of training rounds

valids

a list of lgb.Dataset objects, used for validation

obj

objective function, can be character or custom objective function. Examples include regression, regression_l1, huber, binary, lambdarank, multiclass, multiclass

eval

evaluation function(s). This can be a character vector, function, or list with a mixture of strings and functions.

a. character vector: If you provide a character vector to this argument, it should contain strings with valid evaluation metrics. See The "metric" section of the documentation for a list of valid metrics.
b. function: You can provide a custom evaluation function. This should accept the keyword arguments preds and dtrain and should return a named list with three elements:
- name: A string with the name of the metric, used for printing and storing results.
- value: A single number indicating the value of the metric for the given predictions and true values
- higher_better: A boolean indicating whether higher values indicate a better fit. For example, this would be FALSE for metrics like MAE or RMSE.
c. list: If a list is given, it should only contain character vectors and functions. These should follow the requirements from the descriptions above.

verbose

verbosity for output, if <= 0, also will disable the print of evaluation during training

record

Boolean, TRUE will record iteration message to booster$record_evals

eval_freq

evaluation output frequency, only effect when verbose > 0

init_model

path of model file of lgb.Booster object, will continue training from this model

colnames

feature names, if not null, will use this to overwrite the names in dataset

categorical_feature

categorical features. This can either be a character vector of feature names or an integer vector with the indices of the features (e.g. c(1L, 10L) to say "the first and tenth columns").

early_stopping_rounds

int. Activates early stopping. When this parameter is non-null, training will stop if the evaluation of any metric on any validation set fails to improve for early_stopping_rounds consecutive boosting rounds. If training stops early, the returned model will have attribute best_iter set to the iteration number of the best iteration.

callbacks

List of callback functions that are applied at each iteration.

reset_data

Boolean, setting it to TRUE (not the default value) will transform the booster model into a predictor model which frees up memory and the original datasets

...

other parameters, see the "Parameters" section of the documentation for more information. A few key parameters:

boosting: Boosting type. "gbdt", "rf", "dart" or "goss".
num_leaves: Maximum number of leaves in one tree.
max_depth: Limit the max depth for tree model. This is used to deal with overfitting. Tree still grow by leaf-wise.
num_threads: Number of threads for LightGBM. For the best speed, set this to the number of real CPU cores(parallel::detectCores(logical = FALSE)), not the number of threads (most CPU using hyper-threading to generate 2 threads per CPU core).

NOTE: As of v3.3.0, use of ... is deprecated. Add parameters to params directly.

Early Stopping

"early stopping" refers to stopping the training process if the model's performance on a given validation set does not improve for several consecutive iterations.

If multiple arguments are given to eval, their order will be preserved. If you enable early stopping by setting early_stopping_rounds in params, by default all metrics will be considered for early stopping.

If you want to only consider the first metric for early stopping, pass first_metric_only = TRUE in params. Note that if you also specify metric in params, that metric will be considered the "first" one. If you omit metric, a default metric will be used based on your choice for the parameter obj (keyword argument) or objective (passed into params).

Examples

Run this code

# \donttest{
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(
  objective = "regression"
  , metric = "l2"
  , min_data = 1L
  , learning_rate = 1.0
)
valids <- list(test = dtest)
model <- lgb.train(
  params = params
  , data = dtrain
  , nrounds = 5L
  , valids = valids
  , early_stopping_rounds = 3L
)
# }

Run the code above in your browser using DataLab