Learn R Programming

Laurae (version 0.0.0.9001)

LauraeML_gblinear_par: Laurae's Machine Learning (xgboost gblinear helper parallel function)

Description

This function is a demonstration function for using xgboost gblinear in LauraeML with premade folds (in addition to being parallelized over folds, assuming mcl in the global environment is the parallel cluster). It has alpha, lambda, and lambda_bias as tunable hyperparameters. It also accepts feature selection, and performs full logging (every part is commented in the source) with writing to an external file in order to follow the hyperparameters and feature count.

Usage

LauraeML_gblinear_par(x, y, mobile, parallelized, maximize, logging, data,
  label, folds)

Arguments

x
Type: vector (numeric). The hyperparameters to use.
y
Type: vector (numeric). The features to use, as binary format (0 for not using, 1 for using).
mobile
Type: environment. The environment passed from LauraeML.
parallelized
Type: parallel socket cluster (makeCluster or similar). The parallelized parameter passed from LauraeML (whether to parallelize training per folds or not).
maximize
Type: boolean. The maximize parameter passed from LauraeML (whether to maximize or not the metric).
logging
Type: character. The logging parameter passed from LauraeML (where to store log file).
data
Type: data.table (mandatory). The data features. Comes from LauraeML.
label
Type: vector (numeric). The labels. Comes from LauraeML.
folds
Type: list of numerics. The folds as list. Comes from LauraeML.

Value

The score of the cross-validated xgboost gblinear model, for the provided hyperparameters and features to use.

Examples

Run this code
## Not run: ------------------------------------
# # To run before using LauraeML
# library(doParallel)
# library(foreach)
# mcl <- makeCluster(4)
# invisible(clusterEvalQ(mcl, library("xgboost")))
# invisible(clusterEvalQ(mcl, library("data.table")))
# invisible(clusterEvalQ(mcl, library("Laurae")))
# 
# # In case you are doing manual training, try this.
# # We suppose our data is in the variable "data" and labels in "label".
# 
# folds <- Laurae::kfold(label, k = 5)
# temp_data <- list()
# temp_label <- list()
# 
# for (i in 1:length(folds)) {
# 
# temp_data[[i]] <- list()
# temp_data[[i]][[1]] <- Laurae::DTsubsample(data,
#                                            kept = folds[[i]],
#                                            remove = TRUE,
#                                            low_mem = FALSE,
#                                            collect = 0,
#                                            silent = TRUE)
# temp_data[[i]][[2]] <- Laurae::DTsubsample(data,
#                                            kept = folds[[i]],
#                                            remove = FALSE,
#                                            low_mem = FALSE,
#                                            collect = 0,
#                                            silent = TRUE)
# temp_label[[i]] <- list()
# temp_label[[i]][[1]] <- label[-folds[[i]]]
# temp_label[[i]][[2]] <- label[folds[[i]]]
# 
# }
# 
# clusterExport(mcl, c("temp_data", "temp_label"), envir = environment())
# registerDoParallel(cl = mcl)
# 
# # This will not run correctly because it's not made to be used like that
# LauraeML_gblinear_par(x = c(1, 1, 1),
#                       y = rep(1, ncol(data)),
#                       mobile = NA,
#                       parallelized = mcl,
#                       maximize = TRUE,
#                       logging = NULL,
#                       data = temp_data,
#                       label = temp_label,
#                       folds = folds)
# 
# # Stops the cluster
# registerDoSEQ()
# stopCluster(mcl)
# #closeAllConnections() # In case of emergency if your cluster do not answer
## ---------------------------------------------

Run the code above in your browser using DataLab