optimizeModel: Optimize Model

Description

The function uses a Genetic Algorithm implementation to optimize the model hyperparameter configuration according to the chosen metric.

Usage

optimizeModel(model, hypers, metric, test = NULL, bg4test = NULL,
  pop = 20, gen = 5, env = NULL, parallel = FALSE,
  keep_best = 0.4, keep_random = 0.2, mutation_chance = 0.4,
  seed = NULL)

Arguments

model

'>SDMmodel or '>SDMmodelCV object.

hypers

named list containing the values of the hyperparameters that should be tuned, see details.

metric

character. The metric used to evaluate the models, possible values are: "auc", "tss" and "aicc".

test

'>SWD object. Test dataset used to evaluate the model, not used with aicc and '>SDMmodelCV objects, default is NULL.

bg4test

'>SWD object or NULL. Background locations are used to get subsamples if the a hyperparameter is tuned, default is NULL.

pop

numeric. Size of the population, default is 20.

gen

numeric. Number of generations, default is 20.

env

stack containing the environmental variables, used only with "aicc", default is NULL.

parallel

logical, if TRUE it uses parallel computation, default is FALSE.

keep_best

numeric. Percentage of the best models in the population to be retained during each iteration, expressed as decimal number. Default is 0.4.

keep_random

numeric. Probability of retaining the excluded models during each iteration, expressed as decimal number. Default is 0.2.

mutation_chance

numeric. Probability of mutation of the child models, expressed as decimal number. Default is 0.4.

seed

numeric. The value used to set the seed to have consistent results, default is NULL.

Value

'>SDMtune object.

Details

To know which hyperparameters can be tuned you can use the output of the function get_tunable_args. Parallel computation increases the speed only for large datasets due to the time necessary to create the cluster. Part of the code is inspired by this post.

Examples

Run this code

# NOT RUN {
# Acquire environmental variables
files <- list.files(path = file.path(system.file(package = "dismo"), "ex"),
                    pattern = "grd", full.names = TRUE)
predictors <- raster::stack(files)

# Prepare presence locations
p_coords <- condor[, 1:2]

# Prepare background locations
bg_coords <- dismo::randomPoints(predictors, 5000)

# Create SWD object
presence <- prepareSWD(species = "Vultur gryphus", coords = p_coords,
                       env = predictors, categorical = "biome")
bg <- prepareSWD(species = "Vultur gryphus", coords = bg_coords,
                 env = predictors, categorical = "biome")

# Split presence locations in training (80%) and testing (20%) datasets
datasets <- trainValTest(presence, test = 0.2)
train <- datasets[[1]]
test <- datasets[[2]]

# Train a model
model <- train(method = "Maxent", p = train, a = bg, fc = "l")

# Define the hyperparameters to test
h <- list(reg = 1:3, fc = c("lqp", "lqph", "lh"), a = seq(3000, 4500, 500),
          iter = seq(300, 700, 100))

# Run the function using as metric the AUC
output <- optimizeModel(model, hypers = h, metric = "auc", test = test,
                        bg4test = bg, seed = 25)
output@results
output@models
output@models[[1]]  # Best model
# }