BIOMOD_tuning: Tune models parameters

Description

Function to tune biomod single models parameters

Usage

BIOMOD_tuning(
  data,
  models = c("GLM", "GBM", "GAM", "CTA", "ANN", "SRE", "FDA", "MARS", "RF",
    "MAXENT.Phillips"),
  models.options = BIOMOD_ModelingOptions(),
  method.ANN = "avNNet",
  method.RF = "rf",
  method.MARS = "earth",
  method.GAM = "gam",
  method.GLM = "glmStepAIC",
  trControl = NULL,
  metric = "ROC",
  ctrl.CTA = NULL,
  ctrl.RF = NULL,
  ctrl.ANN = NULL,
  ctrl.MARS = NULL,
  ctrl.FDA = NULL,
  ctrl.GAM = NULL,
  ctrl.GBM = NULL,
  ctrl.GLM = NULL,
  tuneLength = 30,
  decay.tune.ANN = c(0.001, 0.01, 0.05, 0.1),
  size.tune.ANN = c(2, 4, 6, 8),
  maxit.ANN = 500,
  MaxNWts.ANN = 10 * (ncol(data@data.env.var) + 1) + 10 + 1,
  type.GLM = c("simple", "quadratic", "polynomial", "s_smoother"),
  interaction.GLM = c(0, 1),
  cvmethod.ME = "randomkfold",
  overlap.ME = FALSE,
  kfolds.ME = 10,
  n.bg.ME = 10000,
  env.ME = NULL,
  metric.ME = "ROC",
  clamp.ME = TRUE,
  parallel.ME = FALSE,
  numCores.ME = NULL,
  Yweights = NULL
)

Arguments

data

BIOMOD.formated.data object returned by BIOMOD_FormatingData

models

vector of models names choosen among 'GLM', 'GBM', 'GAM', 'CTA', 'ANN', 'SRE', 'FDA', 'MARS', 'RF', 'MAXENT.Phillips'

models.options

BIOMOD.models.options object returned by BIOMOD_ModelingOptions. Default: BIOMOD_ModelingOptions()

method.ANN

which classification or regression model to use for artificial neural networks (default: "avNNet"). see http://topepo.github.io/caret/Neural_Network.html

method.RF

which classification or regression model to use for randomForest (default: "rf"). see http://topepo.github.io/caret/Random_Forest.html

method.MARS

which classification or regression model to use for mars (default: "earth"). see http://topepo.github.io/caret/Multivariate_Adaptive_Regression_Splines.html

method.GAM

which classification or regression model to use for GAM (default: "gam"). see http://topepo.github.io/caret/Generalized_Additive_Model.html

method.GLM

which classification or regression model to use for GLM: (default: 'glmStepAIC'). see http://topepo.github.io/caret/Generalized_Linear_Model.html

trControl

global control parameters for runing (default trainControl(method="cv",summaryFunction=twoClassSummary,classProbs=T),returnData = FALSE). for details see trainControl

metric

metric to select the optimal model (Default ROC). TSS (maximizing Sensitivity and Specificity) is also possible. see ?train

ctrl.CTA

specify control parameters only for CTA (default trControl)

ctrl.RF

specify control parameters only for RF (default trControl)

ctrl.ANN

specify control parameters only for ANN (default trControl)

ctrl.MARS

specify control parameters only for MARS (default trControl)

ctrl.FDA

specify control parameters only for FDA (default trControl)

ctrl.GAM

specify control parameters only for GAM (default trControl)

ctrl.GBM

specify control parameters only for GBM (default trControl)

ctrl.GLM

specify control parameters only for GLM (default trControl)

tuneLength

see ?train (default 30)

decay.tune.ANN

weight decay parameters used for tuning for ANN (default: c(0.001, 0.01, 0.05, 0.1)) Will be optimised by method specified in ctrl.ANN (if not available trControl).

size.tune.ANN

size parameters (number of units in the hidden layer) for ANN used for tuning (default: c(2,4,6,8)). Will be optimised using the method specified in ctrl.ANN (if not available trControl).

maxit.ANN

maximum number of iterations for ANN (default 500)

MaxNWts.ANN

The maximum allowable number of weights for ANN (default 10 * (ncol(myBiomodData'at'data.env.var) + 1) + 10 + 1).

type.GLM

vector of modeling types choosen among 'simple', 'quadratic', 'polynomial' or 's_smoother' (default c('simple','quadratic','polynomial','s_smoother'))

interaction.GLM

vector of interaction type choosen among 0, 1. Default c(0,1)

cvmethod.ME

method used for data partitioning for MAXENT.Phillips (default: 'randomkfold')

overlap.ME

logical; Calculates pairwise metric of niche overlap if TRUE (Default: FALSE). (see ?calc.niche.overlap)

kfolds.ME

number of bins to use for k-fold cross-validation used for MAXENT.Phillips (Default: 10).

n.bg.ME

Number of Background points used to run MAXENT.Phillips (Default: 10000)

env.ME

RasterStack of model predictor variables

metric.ME

metric to select the optimal model for MAXENT.Phillips (Default: ROC). One out of Mean.AUC (or ROC), Mean.AUC.DIFF, Mean.ORmin, Mean.OR10 and AICc. see ?ENMevaluate and Muscarella et al. 2014

clamp.ME

logical; If TRUE (Default) "Features are constrained to remain within the range of values in the training data" (Elith et al. 2011)

parallel.ME

logical. If TRUE, the parallel computing is enabled for MAXENT.Phillips

numCores.ME

number of cores used to train MAXENT.Phillips

Yweights

response points weights. This argument will only affect models that allow case weights.

Value

BIOMOD.models.options object with optimized parameters

References

Kuhn, Max. 2008. Building predictive models in R using the caret package. Journal of Statistical Software 28, 1-26. Kuhn, Max, and Kjell Johnson. 2013. Applied predictive modeling. New York: Springer. Muscarella, Robert, et al. 2014. ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for Maxent ecological niche models. Methods in Ecology and Evolution, 5, 1198-1205.

Examples

Run this code

# NOT RUN {
# species occurrences
DataSpecies <- read.csv(system.file("external/species/mammals_table.csv",
                                    package="biomod2"))
head(DataSpecies)

# the name of studied species
myRespName <- 'GuloGulo'

# the presence/absences data for our species 
myResp <- as.numeric(DataSpecies[,myRespName])

# the XY coordinates of species data
myRespXY <- DataSpecies[,c("X_WGS84","Y_WGS84")]

# Environmental variables extracted from BIOCLIM (bio_3, bio_4, bio_7, bio_11 & bio_12)
myExpl = stack( system.file( "external/bioclim/current/bio3.grd", 
                             package="biomod2"),
                system.file( "external/bioclim/current/bio4.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio7.grd", 
                             package="biomod2"),  
                system.file( "external/bioclim/current/bio11.grd", 
                             package="biomod2"), 
                system.file( "external/bioclim/current/bio12.grd", 
                             package="biomod2"))
# 1. Formatting Data
myBiomodData <- BIOMOD_FormatingData(resp.var = myResp,
                                     expl.var = myExpl,
                                     resp.xy = myRespXY,
                                     resp.name = myRespName)

# 2. Defining Models Options using default options.
### Duration for turing all models sequential with default settings 
### on 3.4 GHz processor: approx. 45 min tuning all models in parallel
### (on 8 cores) using foreach loops runs much faster: approx. 14 min

#library(doParallel);cl<-makeCluster(8);doParallel::registerDoParallel(cl) 


time.seq<-system.time(Biomod.tuning <- BIOMOD_tuning(myBiomodData,
                                                             env.ME = myExpl,
                                                             n.bg.ME = ncell(myExpl)))
#stopCluster(cl)

myBiomodModelOut <- BIOMOD_Modeling( myBiomodData, 
                                     models = c('RF','CTA'), 
                                     models.options = Biomod.tuning$models.options, 
                                     NbRunEval=1, 
                                     DataSplit=100, 
                                     VarImport=0, 
                                     models.eval.meth = c('ROC'),
                                     do.full.models=FALSE,
                                     modeling.id="test")


#  eval.plot(Biomod.tuning$tune.MAXENT.Phillips at results)
par(mfrow=c(1,3))
plot(Biomod.tuning$tune.CTA.rpart)
plot(Biomod.tuning$tune.CTA.rpart2)
plot(Biomod.tuning$tune.RF)
# }

Run the code above in your browser using DataLab