This function estimates parameters for xgboost based on bayesian optimization.
xgb_opt(train_data, train_label, test_data, test_label, objectfun, evalmetric,
eta_range = c(0.1, 1L), max_depth_range = c(4L, 6L),
nrounds_range = c(70, 160L), subsample_range = c(0.1, 1L),
bytree_range = c(0.4, 1L), init_points = 4, n_iter = 10, acq = "ei",
kappa = 2.576, eps = 0, optkernel = list(type = "exponential", power =
2), classes = NULL)
A data frame for training of xgboost
The column of class to classify in the training data
A data frame for training of xgboost
The column of class to classify in the test data
Specify the learning task and the corresponding learning objective
reg:linear
linear regression (Default).
reg:logistic
logistic regression.
binary:logistic
logistic regression for binary classification. Output probability.
binary:logitraw
logistic regression for binary classification, output score before logistic transformation.
multi:softmax
set xgboost to do multiclass classification using the softmax objective. Class is represented by a number and should be from 0 to num_class - 1
.
multi:softprob
same as softmax, but prediction outputs a vector of ndata * nclass elements, which can be further reshaped to ndata, nclass matrix. The result contains predicted probabilities of each data point belonging to each class.
rank:pairwise
set xgboost to do ranking task by minimizing the pairwise loss.
evaluation metrics for validation data. Users can pass a self-defined function to it. Default: metric will be assigned according to objective(rmse for regression, and error for classification, mean average precision for ranking).
error
binary classification error rate
rmse
Rooted mean square error
logloss
negative log-likelihood function
auc
Area under curve
merror
Exact matching error, used to evaluate multi-class classification
The range of eta
The range of max_depth
The range of nrounds
The range of subsample rate
The range of colsample_bytree rate
Number of randomly chosen points to sample the target function before Bayesian Optimization fitting the Gaussian Process.
Total number of times the Bayesian Optimization is to repeated.
Acquisition function type to be used. Can be "ucb", "ei" or "poi".
ucb
GP Upper Confidence Bound
ei
Expected Improvement
poi
Probability of Improvement
tunable parameter kappa of GP Upper Confidence Bound, to balance exploitation against exploration, increasing kappa will make the optimized hyperparameters pursuing exploration.
tunable parameter epsilon of Expected Improvement and Probability of Improvement, to balance exploitation against exploration, increasing epsilon will make the optimized hyperparameters are more spread out across the whole range.
Kernel (aka correlation function) for the underlying Gaussian Process. This parameter should be a list that specifies the type of correlation function along with the smoothness parameter. Popular choices are square exponential (default) or matern 5/2
set the number of classes. To use only with multiclass objectives.
The test accuracy and a list of Bayesian Optimization result is returned:
Best_Par
a named vector of the best hyperparameter set found
Best_Value
the value of metrics achieved by the best hyperparameter set
History
a data.table
of the bayesian optimization history
Pred
a data.table
with validation/cross-validation prediction for each round of bayesian optimization history
# NOT RUN {
library(MlBayesOpt)
set.seed(71)
res0 <- xgb_opt(train_data = fashion_train,
train_label = y,
test_data = fashion_test,
test_label = y,
objectfun = "multi:softmax",
evalmetric = "merror",
classes = 10,
init_points = 3,
n_iter = 1)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab