
Cross-validation for Gradient Boosted Trees
cv.gbt(
object,
K = 5,
repeats = 1,
params = list(),
nrounds = 500,
early_stopping_rounds = 10,
nthread = 12,
train = NULL,
type = "classification",
trace = TRUE,
seed = 1234,
maximize = NULL,
fun,
...
)
A data.frame sorted by the mean of the performance metric
Object of type "gbt" or "ranger"
Number of cross validation passes to use (aka nfold)
Repeated cross validation
List of parameters (see XGBoost documentation)
Number of trees to create
Early stopping rule
Number of parallel threads to use. Defaults to 12 if available
An optional xgb.DMatrix object containing the original training data. Not needed when using Radiant's gbt function
Model type ("classification" or "regression")
Print progress
Random seed to use as the starting point
When a custom function is used, xgb.cv requires the user indicate if the function output should be maximized (TRUE) or minimized (FALSE)
Function to use for model evaluation (i.e., auc for classification and RMSE for regression)
Additional arguments to be passed to 'fun'
See https://radiant-rstats.github.io/docs/model/gbt.html for an example in Radiant
gbt
to generate an initial model that can be passed to cv.gbt
Rsq
to calculate an R-squared measure for a regression
RMSE
to calculate the Root Mean Squared Error for a regression
MAE
to calculate the Mean Absolute Error for a regression
auc
to calculate the area under the ROC curve for classification
profit
to calculate profits for classification at a cost/margin threshold
if (FALSE) {
result <- gbt(dvd, "buy", c("coupon", "purch", "last"))
cv.gbt(result, params = list(max_depth = 1:6))
cv.gbt(result, params = list(max_depth = 1:6), fun = "logloss")
cv.gbt(
result,
params = list(learning_rate = seq(0.1, 1.0, 0.1)),
maximize = TRUE, fun = profit, cost = 1, margin = 5
)
result <- gbt(diamonds, "price", c("carat", "color", "clarity"), type = "regression")
cv.gbt(result, params = list(max_depth = 1:2, min_child_weight = 1:2))
cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1)), fun = Rsq, maximize = TRUE)
cv.gbt(result, params = list(learning_rate = seq(0.1, 0.5, 0.1)), fun = MAE, maximize = FALSE)
}
Run the code above in your browser using DataLab