Provides a set of functions to launch a grid search and get its results.
h2o.grid(
algorithm,
grid_id,
x,
y,
training_frame,
...,
hyper_params = list(),
is_supervised = NULL,
do_hyper_params_check = FALSE,
search_criteria = NULL,
export_checkpoints_dir = NULL,
recovery_dir = NULL,
parallelism = 1
)
Name of algorithm to use in grid search (gbm, randomForest, kmeans, glm, deeplearning, naivebayes, pca).
(Optional) ID for resulting grid search. If it is not specified then it is autogenerated.
(Optional) A vector containing the names or indices of the predictor variables to use in building the model. If x is missing, then all columns except y are used.
The name or column index of the response variable in the data. The response must be either a numeric or a categorical/factor variable. If the response is numeric, then a regression model will be trained, otherwise it will train a classification model.
Id of the training data frame.
arguments describing parameters to use with algorithm (i.e., x, y, training_frame). Look at the specific algorithm - h2o.gbm, h2o.glm, h2o.kmeans, h2o.deepLearning - for available parameters.
List of lists of hyper parameters (i.e., list(ntrees=c(1,2), max_depth=c(5,7))
).
[Deprecated] It is not possible to override default behaviour. (Optional) If specified then override the default heuristic which decides if the given algorithm name and parameters specify a supervised or unsupervised algorithm.
Perform client check for specified hyper parameters. It can be time expensive for large hyper space.
(Optional) List of control parameters for smarter hyperparameter search. The list can
include values for: strategy, max_models, max_runtime_secs, stopping_metric, stopping_tolerance, stopping_rounds and
seed. The default strategy 'Cartesian' covers the entire space of hyperparameter combinations. If you want to use
cartesian grid search, you can leave the search_criteria argument unspecified. Specify the "RandomDiscrete" strategy
to get random search of all the combinations of your hyperparameters with three ways of specifying when to stop the
search: max number of models, max time, and metric-based early stopping (e.g., stop if MSE has not improved by 0.0001
over the 5 best models). Examples below:
list(strategy = "RandomDiscrete", max_runtime_secs = 600, max_models = 100, stopping_metric = "AUTO",
stopping_tolerance = 0.00001, stopping_rounds = 5, seed = 123456)
or list(strategy = "RandomDiscrete",
max_models = 42, max_runtime_secs = 28800)
or list(strategy = "RandomDiscrete", stopping_metric = "AUTO",
stopping_tolerance = 0.001, stopping_rounds = 10)
or list(strategy = "RandomDiscrete", stopping_metric =
"misclassification", stopping_tolerance = 0.00001, stopping_rounds = 5)
.
Directory to automatically export grid and its models to.
When specified the grid and all necessary data (frames, models) will be saved to this
directory (use HDFS or other distributed file-system). Should the cluster crash during training, the grid
can be reloaded from this directory via h2o.loadGrid
and training can be resumed
Level of Parallelism during grid model building. 1 = sequential building (default). Use the value of 0 for adaptive parallelism - decided by H2O. Any number > 1 sets the exact number of models built in parallel.
Launch grid search with given algorithm and parameters.
if (FALSE) {
library(h2o)
library(jsonlite)
h2o.init()
iris_hf <- as.h2o(iris)
grid <- h2o.grid("gbm", x = c(1:4), y = 5, training_frame = iris_hf,
hyper_params = list(ntrees = c(1, 2, 3)))
# Get grid summary
summary(grid)
# Fetch grid models
model_ids <- grid@model_ids
models <- lapply(model_ids, function(id) { h2o.getModel(id)})
}
Run the code above in your browser using DataLab