1. data
.. If you have decide to add pseudo absences to your
original dataset (see
BIOMOD_FormatingData
),
NbPseudoAbsences * NbRunEval + 1
models will be
created.
2. models
.. The set of models to be calibrated on the data. 10
modeling techniques are currently available:
.. - GLM : Generalized Linear Model
(glm
)
.. - GAM : Generalized Additive Model (gam
,
gam
or bam
, see
BIOMOD_ModelingOptions for details on
algorithm selection
)
.. - GBM : Generalized Boosting Model or usually called Boosted
Regression Trees (gbm
)
.. - CTA: Classification Tree Analysis (rpart
)
.. - ANN: Artificial Neural Network (nnet
)
.. - SRE: Surface Range Envelop or usually called BIOCLIM
.. - FDA: Flexible Discriminant Analysis (fda
)
.. - MARS: Multiple Adaptive Regression Splines
(earth
)
.. - RF: Random Forest (randomForest
)
.. - MAXENT.Phillips: Maximum Entropy (
https://biodiversityinformatics.amnh.org/open_source/maxent)
.. - MAXENT.Phillips.2: Maximum Entropy
(maxnet
)
3. NbRunEval & DataSplit
.. As already explained in the BIOMOD_FormatingData
help file, the common trend is to split the original dataset into
two subsets, one to calibrate the models, and another one to evaluate
them. Here we provide the possibility to repeat this process
(calibration and evaluation) N times (NbRunEval
times).
The proportion of data kept for calibration is determined by the
DataSplit
argument (100% - DataSplit
will be used to
evaluate the model). This sort of cross-validation allows to have a
quite robust test of the models when independent data are not
available. Each technique will also be calibrated on the complete
original data. All the models produced by BIOMOD and their related
informations are saved on the hard drive.
4. Yweights & Prevalence
.. Allows to give more or less weight to some particular
observations. If these arguments is kept to NULL
(Yweights = NULL
, Prevalence = NULL
), each
observation (presence or absence) has the same weight (independent
of the number of presences and absences). If Prevalence = 0.5
absences will be weighted equally to the presences (i.e. the
weighted sum of presence equals the weighted sum of absences). If
prevalence is set below or above 0.5 absences or presences are given
more weight, respectively.
.. In the particular case that pseudo-absence data have been
generated BIOMOD_FormatingData
(PA.nb.rep > 0
), weights
are by default (Prevalence = NULL
) calculated such that
prevalence is 0.5, meaning that the presences will have the same
importance as the absences in the calibration process of the models.
Automatically created Yweights
will be composed of integers to
prevent different modeling issues.
.. Note that the Prevalence
argument will always be ignored if
Yweights
are defined.
5. models.eval.meth
.. The available evaluations methods are :
.. - ROC
: Relative Operating Characteristic
.. - KAPPA
: Cohen's Kappa (Heidke skill score)
.. - TSS
: True kill statistic (Hanssen and Kuipers
discriminant, Peirce's skill score)
.. - FAR
: False alarm ratio
.. - SR
: Success ratio
.. - ACCURANCY
: Accuracy (fraction correct)
.. - BIAS
: Bias score (frequency bias)
.. - POD
: Probability of detection (hit rate)
.. - CSI
: Critical success index (threat score)
.. - ETS
: Equitable threat score (Gilbert skill score)
Some of them are scaled to have all an optimum at 1. You can choose
one of more (vector) evaluation metric. By Default, only 'KAPPA',
'TSS' and 'ROC' evaluation are done. Please refer to the CAWRC
website (http://www.cawcr.gov.au/projects/verification/##'Methods_for_dichotomous_forecasts)
to get detailed description of each metric.
6. SaveObj
If this argument is set to False, it may prevent the evaluation of
the ‘ensemble modeled’ models in further steps. We strongly
recommend to always keep this argument TRUE
even it asks for
free space onto the hard drive.
7. rescal.all.models
This parameter is quite experimental and we advise not to use
it. It should lead to reduction in projection scale amplitude
Some categorical models have to be scaled in every case (
‘FDA’, ‘ANN’). But It may be interesting to scale all
model computed to ensure that they will produced comparable
predictions (0-1000 ladder). That's particularly useful when you
do some ensemble forecasting to remove the scale prediction effect
(the more extended projections are, the more they influence ensemble
forecasting results).
8. do.full.models
Building models with all information available may be useful in some
particular cases (i.e. rare species with few presences points). The
main drawback of this method is that, if you don't give separated
data for models evaluation, your models will be evaluated with the
same data that the ones used for calibration. That will lead to
over-optimistic evaluation scores. Be careful with this '_Full'
models interpretation.