Learn R Programming

Laurae (version 0.0.0.9001)

report.xgb: Extreme Gradient Boosting HTML report

Description

This function creates an xgboost as a HTML file. Cross-validation is mandatory. Does NOT handle multiclass scenarios or non-regression/classification tasks. Does NOT handle gblinear. You cannot use process_type, updater, and refresh_leaf parameters. Add quiet = TRUE to the list of arguments to make the function "shut up" the massive verbose text.

Usage

report.xgb(data, label, folds, params, normalize = TRUE,
  classification = TRUE, threshold = 0.5, importance = TRUE,
  unbiased = TRUE, stats = TRUE, plots = TRUE, plot_type = "S",
  output_file = "report.xgb.html", output_dir = getwd(), open_file = TRUE,
  quiet = FALSE, ...)

Arguments

data
Type: data.table. The data to fit a xgboost model on.
label
Type: vector. The label the data must fit to.
folds
Type: list of numeric vectors. The folds used.
params
Type: list. The parameters to pass to report.xgb.helper.
normalize
Type: boolean. Whether features should be normalized before being fed to the xgboost model. Defaults to TRUE.
classification
Type: boolean. Whether the task is a classification or not. Defaults to TRUE.
threshold
Type: numeric. The binary threshold to use for statistics when using classification == TRUE. Defaults to 0.5.
importance
Type: boolean. Whether to perform feature importance computation or not. Defaults to TRUE.
unbiased
Type: boolean. Whether to perform unbiased feature importance computation or not. This doubles (sometimes triples) the effective training time, therefore this must be used with caution (for the benefits of getting very accurate and unbiased feature importance from the final cross-validated models). Defaults to TRUE.
stats
Type: boolean. Whether machine learning statistics should be output for model performance diagnosis. When TRUE, also returns the metrics and the out of fold predictions. Defaults to TRUE.
plots
Type: boolean. Whether plotting of fitted values vs predicted values should be done. Defaults to TRUE.
plot_type
Type: character. The type of plot to use for classification threshold calibration plots. "p" for points, "l" for lines, "b" for points+line, "c" for line without points, "o" for overplotted (points+line overlapping), "h" for high-density vertical lines (histogram-like), "s" for optimistic stair steps, "S" for pessimistic stair steps, "n" to plot nothing. Defaults to "S" for pessimistic stair step.
output_file
Type: character. The output report file name. Defaults to "report.lm.html".
output_dir
Type: character. The output report directory name. Defaults to getwd().
open_file
Type: boolean. Whether to open the output report once it has finished computing. Defaults to TRUE.
quiet
Type: boolean. Whether to "shut up" while rendering the HTML file or not. Defaults to FALSE.
...
Other arguments to pass to rmarkdown::render.

Value

Returns a list with the machine learning metrics ("Metrics"), the machine learning probabilities ("Probs"), the folds "Folds", the fitted values per fold ("Fitted"), the predicted values per fold ("Predicted"), the biased feature importance ("BiasedImp"), and the unbiased feature importance ("UnbiasedImp") if they were computed. Otherwise, returns TRUE.

Examples

Run this code
# No example.
## Not run: ------------------------------------
#   library(Laurae)
#   library(data.table)
#   library(rmarkdown)
#   library(xgboost)
#   library(DT)
#   library(formattable)
#   library(matrixStats)
#   library(lattice)
#   library(R.utils)
## ---------------------------------------------

Run the code above in your browser using DataLab