ensemble function uses the fitted models in an sdmModels
object to generate an ensemble/consensus of predictions by multiple individual models. Several ensemble
methods are available and can be defined in the setting argument.
A list of settings can be introduced in the setting
argument including:
- method
: a character vector specifies which ensemble method(s) should be employed (multiple choice is possible). The details about the available methods are provided at the end of this page.
- stat
: if the - method='weighted'
is used, it specifies which evaluation metrics can be used as weight in the weighted averaging procedure. Alternatively, one may directly introduce weights (see the next argument).
- weights
: an optional numeric vector (with a length equal to the models that are successfully fitted) to specify the weights for weighted averaging procedure (if the method='weighted' is specified).
- id
: specifies the model IDs that should be considered in the ensemble procedure. If missing, all the models that are successfully fitted are considered.
- expr
: A character or an expression specifies a condition to select models for the ensemble procedure. For example: expr='auc > 0.7'
only use models with AUC accuracy greater than 0.7. OR expr='auc > 0.7 & tss > 0.5'
subsets models based on both AUC and TSS metrics.
- wtest
: specifies which test dataset ("training","test.dep","test.indep") should be used to extract the statistic (stat) values as weights (if a relevant method is specified)
- opt
: if a thershold_based metric is used in is selected in stat
or in expr
, opt
specifies the threshold selection criterion. The possible value can be between 1 to 14 for "sp=se", "max(se+sp)", "min(cost)", "minROCdist", "max(kappa)", "max(ppv+npv)", "ppv=npv", "max(NMI)", "max(ccr)", "prevalence", "P10", "P5", "P1", "P0"
criteria, respectively.
- power
: default: 1, a numeric value to which the weights are raised. Greater value than 1 affects weighting scheme (for the methods e.g., "weighted") to increase the weights for the models with greater weight. For example, if weights are c(0.2,0.2,0.2,0.4), raising them to power 2 would be resulted to new weights as c(0.1428571,0.1428571, 0.1428571, 0.5714286) that causes greater contribution of the models with greater performances to the ensemble output.
---> The available ensemble methods (to be specified in method
) include:
-- 'unweighted': unweighted averaging/mean.
-- 'weighted': weighted averaging.
-- 'median': median.
-- 'pa': mean of predicted presence-absence values (predicted probabilities are first converted to presence-absence given a threshold (opt
defines which threshold optimisation strategy should be used), then they are averaged).
-- 'mean-weighted': A two step averaging, that can be used when several replications are available for each modelling methods (e.g., fitted through bootstrapping or cross-validation resampling); it first takes an unweighted mean over the predicted values of multiple replications for each method (within model averaging), then a weighted mean is employed to combine the probabilities of different methods (between models averaging).
-- 'mean-unweighted': Same as the previous one, but an unweighted mean is also used for the second step (instead of weighted mean).
-- 'median-weighted': Same as the 'mean-weighted, but the median is used in the first step.
-- 'median-unweighted': another two-step method, median is used for the first step and unweighted mean is used for the second step.
----> in addition to tne ensemble methods, some other methods are available to generate some outputs that can represent uncertainty:
-- 'uncertainty' or 'entropy': this method generates the uncertainty among the models' predictions that can be interpreted as model-based uncertainty or inconsistency among different models. It ranges between 0 and 1, 0 means all the models predicted the same value (either presence or absence), and 1 referes to maximum uncertainy, e.g., half of the models predicted presence (or absence) and the other half predicted the oposite value.
-- 'cv': Coefficient of variation of probabilities generated from multiple models
-- 'stdev': Standard deviation of probabilities generated from multiple models
-- 'ci': This generates confidence interval length (marginal error) which assigns the difference between upper and lower limits of confidence interval to each pixel (upper - lower). The default level of confidence interval is 95% (i.e., alpha = 0.05
), unless a different alpha
is defined in setting
. In case two separate upper and lower rasters are needed, by using the following codes, the upper and lower limits can be calculated:
en <- ensemble(x, newdata, setting=list(method=c('mean','ci')))
# taking unweighted averaging and ci
# en[[1]] is the mean of all probabilities and en[[2]] is the ci
ci.upper <- en[[1]] + en[[2]] / 2
# adding marginal error (half of the generated ci) to mean
ci.lower <- en[[1]] - en[[2]] / 2
# subtracting marginal error from mean
plot(ci.upper,main='Upper limit of Confidence Interval - alpha = 0.05')
plot(ci.lower,main='Lower limit of Confidence Interval - alpha = 0.05')