BIOMOD_ModelingOptions: Configure the modeling options for each selected model

Description

Function to set the different options for each modeling technique.

Usage

BIOMOD_ModelingOptions( GLM = NULL,
                        GBM = NULL,
                        GAM = NULL,
                        CTA = NULL,
                        ANN = NULL,
                        SRE = NULL,
                        FDA = NULL,
                        MARS = NULL,
                        RF = NULL,
                        MAXENT.Phillips = NULL,
                        MAXENT.Tsuruoka = NULL)

Arguments

GLM

list of GLM options

GBM

list of GBM options

GAM

list of GAM options

CTA

list of CTA options

ANN

list of ANN options

SRE

list of SRE options

FDA

list of FDA options

MARS

list of MARS options

list of RF options

MAXENT.Phillips

list of MAXENT.Phillips options

MAXENT.Tsuruoka

list of MAXENT.Tsuruoka options

Value

A "BIOMOD.Model.Options" object given to BIOMOD_Modeling

=-=-= GLM =-=-= (<code><a rd-options="stats" href="/link/glm?package=biomod2&version=3.3-7.1&to=stats" data-mini-rdoc="stats::glm">glm</a></code>)

myFormula : a typical formula object (see example). If not NULL, type and interaction.level args are switched off. You can choose to either:
- generate automatically the GLM formula by using the type and interaction.level arguments type (default 'quadratic') : formula given to the model ('simple', 'quadratic' or 'polynomial'). interaction.level (default 0) : integer corresponding to the interaction level between variables considered. Consider that interactions quickly enlarge the number of effective variables used into the GLM.
- or construct specific formula
test (default 'AIC') : Information criteria for the stepwise selection procedure: AIC for Akaike Information Criteria, and BIC for Bayesian Information Criteria ('AIC' or 'BIC'). 'none' is also a supported value which implies to concider only the full model (no stepwise selection). This can lead to convergence issu and strange results.
family (default binomial(link = 'logit')) : a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.) . BIOMOD only runs on presence-absence data so far, so binomial family by default.
control : a list of parameters for controlling the fitting process. For glm.fit this is passed to glm.control.

=-=-= GBM =-=-= (default <code><a rd-options="gbm" href="/link/gbm?package=biomod2&version=3.3-7.1&to=gbm" data-mini-rdoc="gbm::gbm">gbm</a></code>)

Please refer to gbm help file to get the meaning of this options.

distribution (default 'bernoulli')
n.trees (default 2500)
interaction.depth (default 7)
n.minobsinnode (default 5)
shrinkage (default 0.001)
bag.fraction (default 0.5)
train.fraction (default 1)
cv.folds (default 3)
keep.data (default FALSE)
verbose (default FALSE)
perf.method (default 'cv')

=-=-= GAM =-=-= (<code><a rd-options="gam" href="/link/gam?package=biomod2&version=3.3-7.1&to=gam" data-mini-rdoc="gam::gam">gam</a></code> or <code><a rd-options="mgcv" href="/link/gam?package=biomod2&version=3.3-7.1&to=mgcv" data-mini-rdoc="mgcv::gam">gam</a></code>)

algo : either "GAM_gam" (default), "GAM_mgcv" or "BAM_mgcv" defining the chosen GAM function (see gam, gam resp. bam for more details)
myFormula : a typical formula object (see example). If not NULL, type and interaction.level args are switched off. You can choose to either:
- generate automatically the GAM formula by using the type and interaction.level arguments type : the smother used to generate the formula. Only "s_smoother" available at time. interaction.level : integer corresponding to the interaction level between variables considered. Consider that interactions quickly enlarge the number of effective variables used into the GAM. Interaction are not considered if you choosed "GAM_gam" algo
- or construct specific formula
k (default -1 or 4): a smooth term in a formula argument to gam (see gam s or mgcv s)
family (default binomial(link = 'logit')) : a description of the error distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.) . BIOMOD only runs on presence-absence data so far, so binomial family by default.
control : see gam.control or gam.control
some extra "GAM_mgcv" specific options (ignored if algo = "GAM_gam")
- method (default 'GCV.Cp')
- optimizer (default c('outer','newton'))
- select (default FALSE)
- knots (default NULL)
- paramPen (default NULL)

=-=-= CTA =-=-= (<code><a rd-options="rpart" href="/link/rpart?package=biomod2&version=3.3-7.1&to=rpart" data-mini-rdoc="rpart::rpart">rpart</a></code>)

Please refer to rpart help file to get the meaning of the following options.

method (default 'class')
parms (default 'default') : if 'default', default rpart parms value are kept
cost (default NULL)
control: see rpart.control

NOTE: for method and parms, you can give a 'real' value as described in the rpart help file or 'default' that implies default rpart values.

=-=-= ANN =-=-= (<code><a rd-options="nnet" href="/link/nnet?package=biomod2&version=3.3-7.1&to=nnet" data-mini-rdoc="nnet::nnet">nnet</a></code>)

NbCV (default 5) : nb of cross validation to find best size and decay parameters
size (default NULL) : number of units in the hidden layer. If NULL then size parameter will be optimised by cross validation based on model AUC (NbCv cross validation; tested size will be the following c(2,4,6, 8) ). You can also specified a vector of size you want to test. The one giving the best model AUC will be then selected.
decay (default NULL) : parameter for weight decay. If NULL then decay parameter will be optimised by cross validation on model AUC (NbCv cross validation; tested decay will be the following c(0.001, 0.01, 0.05, 0.1) ). You can also specified a vector of decay you want to test. The one giving the best model AUC will be then selected.
rang (default 0.1) : Initial random weights on [-rang, rang]
maxit (default 200): maximum number of iterations.

=-=-= SRE =-=-= (<code><a rd-options="biomod2" href="/link/sre?package=biomod2&version=3.3-7.1&to=biomod2" data-mini-rdoc="biomod2::sre">sre</a></code>)

quant (default 0.025): quantile of 'extreme environmental variable' removed for selection of species envelops

=-=-= FDA =-=-= (<code><a rd-options="mda" href="/link/fda?package=biomod2&version=3.3-7.1&to=mda" data-mini-rdoc="mda::fda">fda</a></code>)

Please refer to fda help file to get the meaning of these options.

method (default 'mars')
add_args (default NULL) : additional arguments to method given as a list of parameters (corespond to the … options of fda function)

=-=-= MARS -=-= (<code><a rd-options="earth" href="/link/earth?package=biomod2&version=3.3-7.1&to=earth" data-mini-rdoc="earth::earth">earth</a></code>)

Please refer to earth help file to get the meaning of these options.

myFormula : a typical formula object (see example). If not NULL, type and interaction.level args are switched off. You can choose to either:
- generate automatically the GLM formula by using the type and interaction.level arguments type (default 'simple') : formula given to the model ('simple', 'quadratic' or 'polynomial'). interaction.level (default 0) : integer corresponding to the interaction level between variables considered. Consider that interactions quickly enlarge the number of effective variables used into the GLM/MARS.
- or construct specific formula
nk (default NULL) : an optional integer specifying the maximum number of model terms. If NULL is given then default mars function value is used ( i.e max(21, 2 * nb_expl_var + 1) )
penalty (default 2)
thresh (default 0.001)
nprune (default NULL)
pmethod (default "backward")

=-=-= RF -=-=-= (<code><a rd-options="randomForest" href="/link/randomForest?package=biomod2&version=3.3-7.1&to=randomForest" data-mini-rdoc="randomForest::randomForest">randomForest</a></code>)

do.classif (default TRUE) : if TRUE classification random.forest computed else regression random.forest will be done
ntree (default 500)
mtry (default 'default')
nodesize (default 5)
maxnodes (default NULL)

NOTE: for mtry, you can give a 'real' value as described in randomForest help file or 'default' that implies default randomForest values

=-=-= MAXENT.Phillips -= <a href="http://www.cs.princeton.edu/~schapire/maxent/">http://www.cs.princeton.edu/~schapire/maxent/</a>

path_to_maxent.jar : character, the link to maxent.jar file (the working directory by default)
memory_allocated : integer (default 512), the amount of memory (in Mo) reserved for java to run MAXENT.Phillips. should be 64, 128, 256, 512, 1024, 2048... or NULL if you want to use default java memory limitation parameter.
maximumiterations : integer (default 200), maximum iteration done
visible : logical (default FALSE), make the Maxent user interface visible
linear : logical (default TRUE), allow linear features to be used
quadratic : logical (default TRUE), allow quadratic features to be used
product : logical (default TRUE), allow product features to be used
threshold : logical (default TRUE), allow threshold features to be used
hinge : logical (default TRUE), allow hinge features to be used
lq2lqptthreshold : integer (default 80), number of samples at which product and threshold features start being used
l2lqthreshold : integer (default 10), number of samples at which quadratic features start being used
hingethreshold : integer (default 15), number of samples at which hinge features start being used
beta_threshold : numeric (default -1.0), regularization parameter to be applied to all threshold features; negative value enables automatic setting
beta_categorical : numeric (default -1.0), regularization parameter to be applied to all categorical features; negative value enables automatic setting
beta_lqp : numeric (default -1.0), regularization parameter to be applied to all linear, quadratic and product features; negative value enables automatic setting
beta_hinge : numeric (default -1.0), regularization parameter to be applied to all hinge features; negative value enables automatic setting
betamultiplier : numeric (default 1), multiply all automatic regularization parameters by this number. A higher number gives a more spread-out distribution.
defaultprevalence : numeric (default 0.5), default prevalence of the species: probability of presence at ordinary occurrence points

=-=-= MAXENT.Tsuruoka -=-=-=

l1_regularizer (default 0.0): An numeric turning on L1 regularization and setting the regularization parameter. A value of 0 will disable L1 regularization
l2_regularizer (default 0.0): An numeric turning on L2 regularization and setting the regularization parameter. A value of 0 will disable L2 regularization
use_sgd (default FALSE): A logical indicating that SGD parameter estimation should be used. Defaults to FALSE
set_heldout (default 0): An integer specifying the number of documents to hold out. Sets a held-out subset of your data to test against and prevent overfitting
verbose (default FALSE): A logical specifying whether to provide descriptive output about the training process

NOTE: if you use the set_heldout parameter then the data that will be held out will be taken in the calibration data pool. It can be penilizing in case of low number of occurences dataset.

Details

The aim of this function is to allow advanced user to change some default parameters of BIOMOD inner models. For each modeling technique, options can be set up.

Each argument have to be put in a list object.

The best way to use this function is to print defaut models options (Print_Default_ModelingOptions) or create a default 'BIOMOD.model.option object' and print it in your console. Then copy the output, change only the required parameters, and paste it as function arguments. (see example)

Here the detailed list of modifiable parameters. They correspond to the traditional parameters that could be setted out for each modeling technique (e.g. ?GLM)

Examples

Run this code

# NOT RUN {
# default BIOMOD.model.option object
myBiomodOptions <- BIOMOD_ModelingOptions()

# print the object
myBiomodOptions

# you can copy a part of the print, change it and custom your options 
# here we want to compute quadratic GLM and select best model with 'BIC' criterium
myBiomodOptions <- BIOMOD_ModelingOptions(
                      GLM = list( type = 'quadratic',
                                  interaction.level = 0,
                                  myFormula = NULL,
                                  test = 'BIC',
                                  family = 'binomial',
                                  control = glm.control(epsilon = 1e-08, 
                                              maxit = 1000, 
                                              trace = FALSE) ))
                                  
# check changes was done
myBiomodOptions

# you can prefer to establish your own GLM formula
myBiomodOptions <- BIOMOD_ModelingOptions(
                    GLM = list( myFormula = formula("Sp277 ~ bio3 + 
                    log(bio10) + poly(bio16,2) + bio19 + bio3:bio19")))

# check changes was done
myBiomodOptions

# you also can directly print default parameters and then follow the same processus
Print_Default_ModelingOptions()

# }

Run the code above in your browser using DataLab