Learn R Programming

Laurae (version 0.0.0.9001)

CascadeForest_pred: Cascade Forest Predictor implementation in R

Description

This function attempts to predict from Cascade Forest using xgboost.

Usage

CascadeForest_pred(model, data, folds = NULL, layer = NULL,
  prediction = TRUE, multi_class = NULL, data_start = NULL,
  return_list = FALSE, low_memory = FALSE)

Arguments

model
Type: list. A model trained by CascadeForest.
data
Type: data.table. A data to predict on. If passing training data, it will predict as if it was out of fold and you will overfit (so, use the list train_preds instead please).
folds
Type: list. The folds as list for cross-validation if using the training data. Otherwise, leave NULL. Defaults to NULL.
layer
Type: numeric. The layer you want to predict on. If not provided (NULL), attempts to guess by taking the last layer of the model. Defaults to NULL.
prediction
Type: logical. Whether the predictions of the forest ensemble are averaged. Set it to FALSE for debugging / feature engineering. Setting it to TRUE overrides return_list. Defaults to TRUE.
multi_class
Type: numeric. How many classes you got. Set to 2 for binary classification, or regression cases. Set to NULL to let it try guessing by reading the model. Defaults to NULL.
data_start
Type: vector of numeric. The initial prediction labels. Set to NULL if you do not know what you are doing. Defaults to NULL.
return_list
Type: logical. Whether lists should be returned instead of concatenated frames for predictions. Defaults to TRUE.
low_memory
Type: logical. Whether to perform the data.table transformations in place to lower memory usage. Defaults to FALSE.

Value

A data.table or a list based on data predicted using model.

Details

For implementation details of Cascade Forest / Complete-Random Tree Forest / Multi-Grained Scanning / Deep Forest, check this: https://github.com/Microsoft/LightGBM/issues/331#issuecomment-283942390 by Laurae.

Examples

Run this code
## Not run: ------------------------------------
# # Load libraries
# library(data.table)
# library(Matrix)
# library(xgboost)
# 
# # Create data
# data(agaricus.train, package = "lightgbm")
# data(agaricus.test, package = "lightgbm")
# agaricus_data_train <- data.table(as.matrix(agaricus.train$data))
# agaricus_data_test <- data.table(as.matrix(agaricus.test$data))
# agaricus_label_train <- agaricus.train$label
# agaricus_label_test <- agaricus.test$label
# folds <- Laurae::kfold(agaricus_label_train, 5)
# 
# # Train a model (binary classification)
# model <- CascadeForest(training_data = agaricus_data_train, # Training data
#                        validation_data = agaricus_data_test, # Validation data
#                        training_labels = agaricus_label_train, # Training labels
#                        validation_labels = agaricus_label_test, # Validation labels
#                        folds = folds, # Folds for cross-validation
#                        boosting = FALSE, # Do not touch this unless you are expert
#                        nthread = 1, # Change this to use more threads
#                        cascade_lr = 1, # Do not touch this unless you are expert
#                        training_start = NULL, # Do not touch this unless you are expert
#                        validation_start = NULL, # Do not touch this unless you are expert
#                        cascade_forests = rep(4, 5), # Number of forest models
#                        cascade_trees = 10, # Number of trees per forest
#                        cascade_rf = 2, # Number of Random Forest in models
#                        cascade_seeds = 0, # Seed per layer
#                        objective = "binary:logistic",
#                        eval_metric = Laurae::df_logloss,
#                        multi_class = 2, # Modify this for multiclass problems
#                        early_stopping = 2, # stop after 2 bad combos of forests
#                        maximize = FALSE, # not a maximization task
#                        verbose = TRUE, # print information during training
#                        low_memory = FALSE)
# 
# # Predict from model
# new_preds <- CascadeForest_pred(model, agaricus_data_test, prediction = FALSE)
# 
# # We can check whether we have equal predictions, it's all TRUE!
# all.equal(model$train_means, CascadeForest_pred(model,
#                                                 agaricus_data_train,
#                                                 folds = folds))
# all.equal(model$valid_means, CascadeForest_pred(model,
#                                                 agaricus_data_test))
# 
# # Attempt to perform fake multiclass problem
# agaricus_label_train[1:100] <- 2
# 
# # Train a model (multiclass classification)
# model <- CascadeForest(training_data = agaricus_data_train, # Training data
#                        validation_data = agaricus_data_test, # Validation data
#                        training_labels = agaricus_label_train, # Training labels
#                        validation_labels = agaricus_label_test, # Validation labels
#                        folds = folds, # Folds for cross-validation
#                        boosting = FALSE, # Do not touch this unless you are expert
#                        nthread = 1, # Change this to use more threads
#                        cascade_lr = 1, # Do not touch this unless you are expert
#                        training_start = NULL, # Do not touch this unless you are expert
#                        validation_start = NULL, # Do not touch this unless you are expert
#                        cascade_forests = rep(4, 5), # Number of forest models
#                        cascade_trees = 10, # Number of trees per forest
#                        cascade_rf = 2, # Number of Random Forest in models
#                        cascade_seeds = 0, # Seed per layer
#                        objective = "multi:softprob",
#                        eval_metric = Laurae::df_logloss,
#                        multi_class = 3, # Modify this for multiclass problems
#                        early_stopping = 2, # stop after 2 bad combos of forests
#                        maximize = FALSE, # not a maximization task
#                        verbose = TRUE, # print information during training
#                        low_memory = FALSE)
# 
# # Predict from model for mutliclass problems
# new_preds <- CascadeForest_pred(model, agaricus_data_test, prediction = FALSE)
# 
# # We can check whether we have equal predictions, it's all TRUE!
# all.equal(model$train_means, CascadeForest_pred(model,
#                                                 agaricus_data_train,
#                                                 folds = folds))
# all.equal(model$valid_means, CascadeForest_pred(model,
#                                                 agaricus_data_test))
## ---------------------------------------------

Run the code above in your browser using DataLab