ENMevaluate
automatically executes Maxent (Phillips et al. 2006; Phillips and Dudik 2008) across a range of settings, returning a data.frame
of evaluation metrics to aid in identifying settings that balance model fit and predictive ability. The function calls Maxent using the maxent
function in the ENMeval-package
and help documentation of the ENMevaluate(occ, env, bg.coords = NULL, occ.grp = NULL,
bg.grp = NULL, RMvalues = seq(0.5, 4, 0.5),
fc = c("L", "LQ", "H", "LQH", "LQHP", "LQHPT"),
categoricals = NULL, n.bg = 10000, method = NULL,
overlap = FALSE, aggregation.factor = c(2, 2),
kfolds = NA, bin.output = FALSE, clamp = TRUE)
tuning(occ, env, bg.coords, occ.grp, bg.grp, method,
maxent.args, args.lab, categoricals, aggregation.factor,
kfolds, bin.output, clamp)
user
' method).user
' method).user
' method)."jackknife", "randomkfold", "user", "block", "checkerboard1", "checkerboard2"
. See details and get.evaluation
TRUE
, provides pairwise metric of niche overlap (see details and calc.niche.overlap
).get.evaluation.bins
).TRUE
, appends evaluations metrics for each evaluation bin to results table (i.e., in addition to the average values across bins).make.args
functionmake.args
function.TRUE
, 'clamping' is used (see Maxent documentation and tutorial for more details).ENMevaluation
with named slots:@results
data.frame of evaluation metrics. If bin.output=TRUE
, evaluation metrics calculated separately for each evaluation bin are included in addition to the averages across k bins.
@predictions
RasterStack of full model predictions with each layer named as: fc_RM
(e.g., L_1
).
@partition.method
character vector with the method used for data partitioning.
@occ.pts
data.frame of the latitude/longitude of input occurrence localities.
@occ.grp
vector identifying the bin for each occurrence locality.
@bg.pts
data.frame of the latitude/longitude of input background localities.
@bg.grp
vector identifying the bin for each background locality.
@overlap
matrix of pairwise niche overlap (blank if overlap = FALSE
).
ENMevaluate
is the primary function for general use in the tuning
function is used internally.Maxent settings: In the current default implementation of Maxent, the combination of feature classes (fc
s) allowed depends on the number of occurrence localities, and the value for the regularization multiplier (RM
) is 1.0. ENMevaluate
provides an automated way to execute ecological niche models in Maxent across a user-specified range of (RM
) values and (fc
) combinations, regardless of sample size. Acceptable values for the fc
argument include: L=linear, Q=quadratic, P=product, T=threshold, and H=hinge (see Maxent help documentation, Phillips et al. (2006), Phillips and Dudik (2008), Elith et al. (2011), and Merow et al. (2013) for additional details on RM
and fc
s). Categorical feature classes (C) are specified by the categoricals
argument.
Methods for partitioning data: ENMevaluate
includes six methods to partition occurrence and background localities into bins for training and testing ('jackknife', 'randomkfold', 'user', 'block',
'checkerboard1', 'checkerboard2'
). The jackknife
method is a special case of k-fold cross validation where the number of folds (k) is equal to the number of occurrence localities (n) in the dataset. The randomkfold
method partitions occurrence localities randomly into a user-specified number of (k) bins - this is equivalent to the method of k-fold cross validation currently provided by Maxent. The user
method enables users to define bins a priori. For this method, the user is required to provide background coordinates (bg.coords
) and bin designations for both occurrence localities (occ.grp
) and background localities (bg.grp
). The block
method partitions the data into four bins according to the lines of latitude and longitude that divide the occurrence localities into bins of as equal number as possible. The checkerboard1
(and checkerboard2
) methods partition data into two (or four) bins based on one (or two) checkerboard patterns with grain size defined as one (or two) aggregation factor(s) of the original environmental layers. Although the checkerboard1
(and checkerboard2
) methods are designed to partition occurrence localities into two (and four) evaluation bins, they may give fewer bins depending on the location of occurrence localities with respect to the checkerboard grid(s) (e.g., all records happen to fall in the "black" squares). A warning is given if the number of bins is < 4 for the checkerboard2
method, and an error is given if all localities fall in a single evaluation bin. Additional details can be found in get.evaluation.bins
.
Evaluation metrics: Four evaluation metrics are calculated using the partitioned dataset, and one additional metric is provided based on the full dataset. ENMevaluate
uses the same background localities and evaluation bin designations for each of the k iterations (for each unique combination of RM
and fc
) to facilitate valid comparisons among model settings.
Mean.AUC
is the area under the curve of the receiver operating characteristic plot made based on the testing data (i.e., AUCtest), averaged across k bins. In each iteration, as currently implemented, the AUCtest value is calculated with respect to the full set of background localities to enable comparisons across the k iterations (Radosavljevic and Anderson 2014). As a relative measure for a given study species and region, high values of Mean.AUC
are associated with the degree to which a model can successfully discriminate occurrence from background localities. This rank-based non-parametric metric, however, does not reveal the model goodness-of-fit (Lobo et al. 2008; Peterson et al. 2011).
To quantify the degree of overfitting, ENMevaluate
calculates three metrics. The first is the difference between training and testing AUC, averaged across k bins (Mean.AUC.DIFF
) (Warren and Seifert 2011). Mean.AUC.DIFF
is expected to be high for models overfit to the training data. ENMevaluate
also calculates two threshold-dependent omission rates that quantify overfitting when compared with the omission rate expected by the threshold employed: the proportion of testing localities with Maxent output values lower than the value associated with (1) the training locality with the lowest value (i.e., the minimum training presence, MTP; = 0 percent training omission) (Mean.ORmin
) and (2) the value that excludes the 10 percent of training localities with the lowest predicted suitability (Mean.OR10
) (Pearson et al. 2007). ENMevaluate
uses corrected.var
to calculate the variance for each of these metrics across k bins (i.e., variances are corrected for non-independence of cross-validation iterations; see Shcheglovitova and Anderson 2013). The value of these metrics for each of the individual k bins is returned if bin.output = TRUE
.
Based on the unpartitioned (full) dataset, ENMevaluate
uses calc.aicc
to calculate the AICc value for each model run and provides delta.AIC, AICc weights, as well as the number of parameters for each model (Warren and Seifert 2011). The AUCtrain value for the full model is also returned (full.AUC
).
To quantify how resulting predictions differ in geographic space depending on the settings used, ENMevaluate
includes an option to compute pairwise niche overlap between all pairs of full models (i.e., using the unpartitioned dataset) with Schoener's D statistic (Schoener 1968; Warren et al. 2009).
Hijmans, R. J., Phillips, S., Leathwick, J. and Elith, J. (2011) dismo package for R. Available online at:
Lobo, J. M., Jimenez-Valverde, A., and Real, R. (2008) AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17: 145-151.
Pearson, R. G., Raxworthy, C. J., Nakamura, M. and Peterson, A. T. 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. Journal of Biogeography, 34: 102-117.
Peterson, A. T., Soberon, J., Pearson, R. G., Anderson, R. P., Martinez-Meyer, E., Nakamura, M. and Araujo, M. B. (2011) Ecological Niches and Geographic Distributions. Monographs in Population Biology, 49. Princeton University Press, Princeton, NJ.
Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190: 231-259.
Phillips, S. J. and Dudik, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31: 161-175.
Merow, C., Smith, M., and Silander, J. A. (2013) A practical guide to Maxent: what it does, and why inputs and settings matter. Ecography, 36: 1-12.
Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography, 41: 629-643.
Schoener, T. W. (1968) The Anolis lizards of Bimini: resource partitioning in a complex fauna. Ecology, 49: 704-726.
Shcheglovitova, M. and Anderson, R. P. (2013) Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes. Ecological Modelling, 269: 9-17. Warren, D. L., Glor, R. E., Turelli, M. and Funk, D. (2009) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-2883; Erratum: Evolution, 65: 1215.
Warren, D.L. and Seifert, S.N. (2011) Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21: 335-342.
maxent
in the ### Simulated data environmental covariates
set.seed(1)
r1 <- raster(matrix(nrow=50, ncol=50, data=runif(10000, 0, 25)))
r2 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100), byrow=TRUE))
r3 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100)))
r4 <- raster(matrix(nrow=50, ncol=50, data=c(rep(1,1000),rep(2,500)),byrow=TRUE))
values(r4) <- as.factor(values(r4))
env <- stack(r1,r2,r3,r4)
### Simulate occurrence localities
nocc <- 50
x <- (rpois(nocc, 2) + abs(rnorm(nocc)))/11
y <- runif(nocc, 0, .99)
occ <- cbind(x,y)
### This call gives the results loaded below
enmeval_results <- ENMevaluate(occ, env, method="block", n.bg=500, overlap=TRUE,
bin.output=TRUE, clamp=TRUE)
data(enmeval_results)
enmeval_results
### See table of evaluation metrics
enmeval_results@results
### Plot prediction with lowest AICc
plot(enmeval_results@predictions[[which (enmeval_results@results$delta.AICc == 0) ]])
points(enmeval_results@occ.pts, pch=21, bg=enmeval_results@occ.grp)
### Niche overlap statistics between model predictions
enmeval_results@overlap
Run the code above in your browser using DataLab