report.xgb: Extreme Gradient Boosting HTML report

Description

This function creates an xgboost as a HTML file. Cross-validation is mandatory. Does NOT handle multiclass scenarios or non-regression/classification tasks. Does NOT handle gblinear. You cannot use process_type, updater, and refresh_leaf parameters. Add quiet = TRUE to the list of arguments to make the function "shut up" the massive verbose text.

Usage

report.xgb(data, label, folds, params, normalize = TRUE,
  classification = TRUE, threshold = 0.5, importance = TRUE,
  unbiased = TRUE, stats = TRUE, plots = TRUE, plot_type = "S",
  output_file = "report.xgb.html", output_dir = getwd(), open_file = TRUE,
  quiet = FALSE, ...)

Arguments

data

Type: data.table. The data to fit a xgboost model on.

label

Type: vector. The label the data must fit to.

folds

Type: list of numeric vectors. The folds used.

params

Type: list. The parameters to pass to report.xgb.helper.

normalize

Type: boolean. Whether features should be normalized before being fed to the xgboost model. Defaults to TRUE.

classification

Type: boolean. Whether the task is a classification or not. Defaults to TRUE.

threshold

Type: numeric. The binary threshold to use for statistics when using classification == TRUE. Defaults to 0.5.

importance

Type: boolean. Whether to perform feature importance computation or not. Defaults to TRUE.

unbiased

Type: boolean. Whether to perform unbiased feature importance computation or not. This doubles (sometimes triples) the effective training time, therefore this must be used with caution (for the benefits of getting very accurate and unbiased feature importance from the final cross-validated models). Defaults to TRUE.

stats

Type: boolean. Whether machine learning statistics should be output for model performance diagnosis. When TRUE, also returns the metrics and the out of fold predictions. Defaults to TRUE.

plots

Type: boolean. Whether plotting of fitted values vs predicted values should be done. Defaults to TRUE.

plot_type

Type: character. The type of plot to use for classification threshold calibration plots. "p" for points, "l" for lines, "b" for points+line, "c" for line without points, "o" for overplotted (points+line overlapping), "h" for high-density vertical lines (histogram-like), "s" for optimistic stair steps, "S" for pessimistic stair steps, "n" to plot nothing. Defaults to "S" for pessimistic stair step.

output_file

Type: character. The output report file name. Defaults to "report.lm.html".

output_dir

Type: character. The output report directory name. Defaults to getwd().

open_file

Type: boolean. Whether to open the output report once it has finished computing. Defaults to TRUE.

quiet

Type: boolean. Whether to "shut up" while rendering the HTML file or not. Defaults to FALSE.

...

Other arguments to pass to rmarkdown::render.

Value

Returns a list with the machine learning metrics ("Metrics"), the machine learning probabilities ("Probs"), the folds "Folds", the fitted values per fold ("Fitted"), the predicted values per fold ("Predicted"), the biased feature importance ("BiasedImp"), and the unbiased feature importance ("UnbiasedImp") if they were computed. Otherwise, returns TRUE.

Examples

Run this code

# No example.
## Not run: ------------------------------------
#   library(Laurae)
#   library(data.table)
#   library(rmarkdown)
#   library(xgboost)
#   library(DT)
#   library(formattable)
#   library(matrixStats)
#   library(lattice)
#   library(R.utils)
## ---------------------------------------------

Run the code above in your browser using DataLab