Learn R Programming

Laurae (version

xgb.opt.depth: xgboost depth automated optimizer


This function allows you to optimize the depth of xgboost in gbtree/dart booster given the other parameters constant. Output is intentionally pushed to the global environment, specifically in Laurae.xgb.opt.depth.df, Laurae.xgb.opt.depth.iter, and Laurae.xgb.opt.depth.best to allow manual interruption without losing data. Verbosity is automatic and cannot be removed. In case you need this function without verbosity, please compile the package after removing verbose messages. In addition, a sink is forced. Make sure to run sink() if you interrupt (or if xgboost interrupts) prematurely the execution of the function. Otherwise, you end up with no more messages printed to your R console. initial = 8, min_depth = 1, max_depth = 25, patience = 2, sd_effect = 0.001, worst_score = 0, learner = NA, better = max_better


xgb.opt.depth(initial = 8, min_depth = 1, max_depth = 25, patience = 2,
  sd_effect = 0.001, worst_score = 0, learner = NA, better = max_better)


The initial starting search depth. This is the starting point, along with initial - 2 and initial + 2 depths. Defaults to 8.
The minimum accepted depth. If it is reached, the computation stops. Defaults to 1.
The maximum accepted depth. If it is reached, the computation stops. Defaults to 25.
How many iterations are allowed without improvement, excluding the initialization (the three first computations). Larger means more patience before stopping due to no improvement of the scored metric. Defaults to 2.
How much the standard deviation accounts in the score to determine the best depth parameter. Default to 0.001.
The worst possible score of the metric used, as a numeric (non NA / Infinite) value. Defaults to 0.
The learner function. It fetches everything needed from the global environment. Defaults to my_learner, which is an example of using that function.
Should we optimize for the minimum or the maximum value of the performance? Defaults to max_better for maximization of the scored metric. Use min_better for the minimization of the scored metric.


Three elements forced in the global environment: "Laurae.xgb.opt.depth.df" for the dataframe with depth log (data.frame), "Laurae.xgb.opt.depth.iter" for the dataframe with iteration log (list), and "Laurae.xgb.opt.depth.best" for a length 1 vector with the best depth found (numeric).


Run this code
#Please check xgb.opt.utils.R file in GitHub.
## Not run: ------------------------------------
# max_better <- function(cp) {
#   return(max(cp, na.rm = TRUE))
# }
# my_learner <- function(depth) {
#   sink(file = "Laurae/log.txt", append = TRUE, split = FALSE)
#   cat("\n\n\nDepth ", depth, "\n\n", sep = "")
#   global_depth <<- depth
#   gc()
#   set.seed(11111)
#   temp_model <- xgb.cv(data = dtrain,
#                        nthread = 12,
#                        folds = folded,
#                        nrounds = 100000,
#                        max_depth = depth,
#                        eta = 0.05,
#                        #gamma = 0.1,
#                        subsample = 1.0,
#                        colsample_bytree = 1.0,
#                        booster = "gbtree",
#                        #eval_metric = "auc",
#                        eval_metric = mcc_eval_nofail_cv,
#                        maximize = TRUE,
#                        early_stopping_rounds = 25,
#                        objective = "binary:logistic",
#                        verbose = TRUE
#                        #base_score = 0.005811208
#   )
#   sink()
#   i <<- 0
#   return(c(temp_model$evaluation_log[[4]][temp_model$best_iteration],
#   temp_model$evaluation_log[[5]][temp_model$best_iteration], temp_model$best_iteration))
# }
# xgb.opt.depth.callback <- function(i, learner, better, sd_effect) {
#   cat("\nExploring depth ", sprintf("%02d", Laurae.xgb.opt.depth.iter[i, "Depth"]), ": ")
#   Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"],
#   c("mean", "sd", "nrounds")] <<- learner(Laurae.xgb.opt.depth.iter[i, "Depth"])
#   Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"],
#   "score"] <<- Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"] +
#   (Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "sd"] * sd_effect)
#   Laurae.xgb.opt.depth.iter[i,
#   "Score"] <<- Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "score"]
#   Laurae.xgb.opt.depth.iter[i,
#   "Best"] <<- better(Laurae.xgb.opt.depth.df[, "score"])
#   Laurae.xgb.opt.depth.best <<- which(
#   Laurae.xgb.opt.depth.df[, "score"] == Laurae.xgb.opt.depth.iter[i, "Best"])[1]
#   cat("[",
#       sprintf("%05d", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "nrounds"]),
#       "] ",
#       sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"]),
#       ifelse(is.na(Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"]) == TRUE,
#       "",
#       paste("+",
#       sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "sd"]),
#       sep = "")),
#       " (Score: ",
#       sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "score"]),
#       ifelse(Laurae.xgb.opt.depth.iter[i, "Best"] == Laurae.xgb.opt.depth.iter[i, "Score"],
#       " <<<)",
#       "    )"),
#       " - best is: ",
#       Laurae.xgb.opt.depth.best,
#       " - ",
#       format(Sys.time(), "%a %b %d %Y %X"),
#       sep = "")
# }
# xgb.opt.depth(initial = 10, min_depth = 1, max_depth = 20, patience = 2, sd_effect = 0,
# worst_score = 0, learner = my_learner, better = max_better)
## ---------------------------------------------

Run the code above in your browser using DataLab