ENMevaluate
automatically executes Maxent (Phillips et al. 2006; Phillips and Dudik 2008) across a range of settings, returning a data.frame
of evaluation metrics to aid in identifying settings that balance model fit and predictive ability. Since version 0.3.0, the default function uses the maxnet
function in the maxnet package (Phillips et al. 2017) to implement the Maxent algorithm (see notes).
ENMevaluate(occ, env, bg.coords = NULL, occ.grp = NULL,
bg.grp = NULL, RMvalues = seq(0.5, 4, 0.5),
fc = c("L", "LQ", "H", "LQH", "LQHP", "LQHPT"),
categoricals = NULL, n.bg = 10000, method = NULL,
algorithm = 'maxnet', overlap = FALSE,
aggregation.factor = c(2, 2), kfolds = NA,
bin.output = FALSE, clamp = TRUE, rasterPreds = TRUE,
parallel = FALSE, numCores = NULL, progbar = TRUE,
updateProgress = FALSE, ...)tuning(occ, env, bg.coords, occ.grp, bg.grp, method,
algorithm, args, args.lab, categoricals,
aggregation.factor, kfolds, bin.output, clamp, alg,
rasterPreds, parallel, numCores, progbar,
updateProgress, userArgs)
Two-column matrix or data.frame of longitude and latitude (in that order) of occurrence localities.
RasterStack of model predictor variables (environmental layers).
Two-column matrix or data.frame of longitude and latitude (in that order) of background localities (required for 'user
' method).
Vector of bins of occurrence localities (required for 'user
' method).
Vector of bins of background localities (required for 'user
' method).
Vector of (non-negative) values to use for the regularization multiplier.
Character vector of feature class combinations to be included in analysis.
Character vector. Use 'maxnet'
to use the maxnet package [default] or 'maxent.jar'
to use the dismo package and the 'maxent.jar' Java program. See details for more information on these different implementations.
Character vector. Use 'maxnet'
to use the maxnet package [default] or 'maxent.jar'
to use the dismo package and the 'maxent.jar' Java program. See details for more information on these different implementations.
Vector indicating which (if any) of the input environmental layers are categorical.
The number of random background localities to draw from the study extent.
Character string designating the method used for data partitioning. Choices are: "jackknife", "randomkfold", "user", "block", "checkerboard1", "checkerboard2"
. See details and get.evaluation.bins
for more information.
logical; If TRUE
, provides pairwise metric of niche overlap (see details and calc.niche.overlap
).
List giving the factor by which the original input grid should be aggregated for checkerboard partitioning methods (see details and get.evaluation.bins
).
Number of bins to use in the k-fold random method of data partitioning.
logical; If TRUE
, appends evaluations metrics for each evaluation bin to results table (i.e., in addition to the average values across bins).
Arguments to pass to Maxent that are generated by the make.args
function
Character labels describing feature classes and regularization multiplier values for Maxent runs provided by the make.args
function.
logical; If TRUE
, 'clamping' is used (see Maxent documentation and tutorial for more details).
logical; If TRUE
, the predict
function from dismo
is used to predict each full model across the extent of the input environmental variables. Note that AICc (and associated values) are NOT calculated if rasterPreds=FALSE
because these calculations require the predicted surfaces. However, setting to FALSE
can significantly reduce run time.
logical; If TRUE
, parallel processing is used to execute tuning function.
numeric; indicates the number of cores to use if running in parallel. If parallel=TRUE
and this is not specified, the total number of available cores are used.
logical; used internally.
logical; used internally.
character vector; use this to pass other arguments (e.g., prevalence) to the `maxent` call. Note that not all options are functional or relevant.
character vector; use this to pass other arguments (e.g., prevalence) to the `maxent` call. Note that not all options are functional or relevant.
An object of class ENMevaluation
with named slots:
@results
data.frame of evaluation metrics. If bin.output=TRUE
, evaluation metrics calculated separately for each evaluation bin are included in addition to the averages and corrected variances (see corrected.var
) across k bins. Note that the names of some columns changed as of Version 0.3.0.
@predictions
RasterStack of full model predictions with each layer named as: fc_RM
(e.g., L_1
). This will be an empty RasterStack if the rasterPreds=FALSE
.
@models
List of objects of class "MaxEnt"
from the dismo package. Each of these entries include slots for lambda values and the original Maxent results table. See Maxent documentation for more information.
@partition.method
character vector with the method used for data partitioning.
@occ.pts
data.frame of the latitude/longitude of input occurrence localities.
@occ.grp
vector identifying the bin for each occurrence locality.
@bg.pts
data.frame of the latitude/longitude of input background localities.
@bg.grp
vector identifying the bin for each background locality.
@overlap
matrix of pairwise niche overlap (blank if overlap = FALSE
).
ENMevaluate
is the primary function for general use in the ENMeval package; the tuning
function is used internally.
Since version 0.3.0, the default ENMevaluate
runs the the Maxent algorithm by calling the maxnet package (Phillips et al. 2017) instead of the previous implementation (still available) that relies on the 'maxent.jar' Java program called by the dismo package. This choice is controlled by the argument algorithm='maxnet'
. A major advantage of this change is that it removes the reliance on Java and the rJava package, which is great but can sometimes cause confusing problems on different computers. There are some differences between the 'maxnet' and 'maxent.jar' algorithms that may lead to slight numeric differences in the results (at least when hinge
feature classes are used). See Phillips et al. (2017) and Phillips (2017) for more details. Additionally, the 'maxnet' algorithm does not provide information on variable importance (from the var.importance()
function) because of differences in the underlying models. Users can still choose to use the 'maxent.jar' implementation by setting algorithm='maxent.jar'
in the ENMevaluate
function (also see note below). Our team has done some fairly extensive testing to ensure this implementation gives the expected results but the maxnet implementation is relatively new (at the time of writing this) and we encourage users to scrutinize their results.
Maxent settings: In the current default implementation of Maxent, the combination of feature classes (fc
s) allowed depends on the number of occurrence localities, and the value for the regularization multiplier (RM
) is 1.0. ENMevaluate
provides an automated way to execute ecological niche models in Maxent across a user-specified range of (RM
) values and (fc
) combinations, regardless of sample size. Acceptable values for the fc
argument include: L=linear, Q=quadratic, P=product, T=threshold, and H=hinge (see Maxent help documentation, Phillips et al. (2006), Phillips and Dudik (2008), Elith et al. (2011), and Merow et al. (2013) for additional details on RM
and fc
s). Categorical feature classes (C) are specified by the categoricals
argument.
Methods for partitioning data: ENMevaluate
includes six methods to partition occurrence and background localities into bins for training and testing ('jackknife', 'randomkfold', 'user', 'block',
'checkerboard1', 'checkerboard2'
). The jackknife
method is a special case of k-fold cross validation where the number of folds (k) is equal to the number of occurrence localities (n) in the dataset. The randomkfold
method partitions occurrence localities randomly into a user-specified number of (k) bins - this is equivalent to the method of k-fold cross validation currently provided by Maxent. The user
method enables users to define bins a priori. For this method, the user is required to provide background coordinates (bg.coords
) and bin designations for both occurrence localities (occ.grp
) and background localities (bg.grp
). The block
method partitions the data into four bins according to the lines of latitude and longitude that divide the occurrence localities into bins of as equal number as possible. The checkerboard1
(and checkerboard2
) methods partition data into two (or four) bins based on one (or two) checkerboard patterns with grain size defined as one (or two) aggregation factor(s) of the original environmental layers. Although the checkerboard1
(and checkerboard2
) methods are designed to partition occurrence localities into two (and four) evaluation bins, they may give fewer bins depending on the location of occurrence localities with respect to the checkerboard grid(s) (e.g., all records happen to fall in the "black" squares). A warning is given if the number of bins is < 4 for the checkerboard2
method, and an error is given if all localities fall in a single evaluation bin. Additional details can be found in get.evaluation.bins
.
Evaluation metrics: Four evaluation metrics are calculated using the partitioned dataset, and one additional metric is provided based on the full dataset. ENMevaluate
uses the same background localities and evaluation bin designations for each of the k iterations (for each unique combination of RM
and fc
) to facilitate valid comparisons among model settings.
avg.test.AUC
is the area under the curve of the receiver operating characteristic plot made based on the testing data (i.e., AUCtest), averaged across k bins. In each iteration, as currently implemented, the AUCtest value is calculated with respect to the full set of background localities to enable comparisons across the k iterations (Radosavljevic and Anderson 2014). As a relative measure for a given study species and region, high values of avg.test.AUC
are associated with the degree to which a model can successfully discriminate occurrence from background localities. This rank-based non-parametric metric, however, does not reveal the model goodness-of-fit (Lobo et al. 2008; Peterson et al. 2011).
To quantify the degree of overfitting, ENMevaluate
calculates three metrics. The first is the difference between training and testing AUC, averaged across k bins (avg.diff.AUC
) (Warren and Seifert 2011). avg.diff.AUC
is expected to be high for models overfit to the training data. ENMevaluate
also calculates two threshold-dependent omission rates that quantify overfitting when compared with the omission rate expected by the threshold employed: the proportion of testing localities with Maxent output values lower than the value associated with (1) the training locality with the lowest value (i.e., the minimum training presence, MTP; = 0 percent training omission) (avg.test.orMTP
) and (2) the value that excludes the 10 percent of training localities with the lowest predicted suitability (avg.test.or10pct
) (Pearson et al. 2007). ENMevaluate
uses corrected.var
to calculate the variance for each of these metrics across k bins (i.e., variances are corrected for non-independence of cross-validation iterations; see Shcheglovitova and Anderson 2013). The value of these metrics for each of the individual k bins is returned if bin.output = TRUE
.
Based on the unpartitioned (full) dataset, ENMevaluate
uses calc.aicc
to calculate the AICc value for each model run and provides delta.AIC, AICc weights, as well as the number of parameters for each model (Warren and Seifert 2011). Note that AICc (and associated values) are NOT calculated if rasterPreds=FALSE
because these calculations require the predicted surfaces. The AUCtrain value for the full model is also returned (train.AUC
).
To quantify how resulting predictions differ in geographic space depending on the settings used, ENMevaluate
includes an option to compute pairwise niche overlap between all pairs of full models (i.e., using the unpartitioned dataset) with Schoeners D statistic (Schoener 1968; Warren et al. 2009).
Elith, J., Phillips, S. J., Hastie, T., Dudik, M., Chee, Y. E., and Yates, C. J. (2011) A statistical explanation of MaxEnt for ecologists. Diversity and Distributions, 17: 43-57.
Hijmans, R. J., Phillips, S., Leathwick, J. and Elith, J. (2011) dismo package for R. Available online at: https://cran.r-project.org/package=dismo.
Lobo, J. M., Jimenez-Valverde, A., and Real, R. (2008) AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17: 145-151.
Muscarella, R., Galante, P.J., Soley-Guardia, M., Boria, R.A., Kass, J., Uriarte, M. and Anderson, R.P. (2014) ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for ecological niche models. Methods in Ecology and Evolution, 5: 1198-1205.
Pearson, R. G., Raxworthy, C. J., Nakamura, M. and Peterson, A. T. 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. Journal of Biogeography, 34: 102-117.
Peterson, A. T., Soberon, J., Pearson, R. G., Anderson, R. P., Martinez-Meyer, E., Nakamura, M. and Araujo, M. B. (2011) Ecological Niches and Geographic Distributions. Monographs in Population Biology, 49. Princeton University Press, Princeton, NJ.
Phillips, S. J. 2017. maxnet package for R. Available online at: https://CRAN.R-project.org/package=maxnet.
Phillips, S. J., Anderson, R. P., Dud<U+00ED>k, M., Schapire, R. E. and Blair, M. E. 2017. Opening the black box: an open-source release of Maxent. Ecography, 40: 887<U+2013>893.
Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190: 231-259.
Phillips, S. J. and Dudik, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31: 161-175.
Merow, C., Smith, M., and Silander, J. A. (2013) A practical guide to Maxent: what it does, and why inputs and settings matter. Ecography, 36: 1-12.
Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography, 41: 629-643.
Schoener, T. W. (1968) The Anolis lizards of Bimini: resource partitioning in a complex fauna. Ecology, 49: 704-726.
Shcheglovitova, M. and Anderson, R. P. (2013) Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes. Ecological Modelling, 269: 9-17.
Warren, D. L., Glor, R. E., Turelli, M. and Funk, D. (2009) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-2883; Erratum: Evolution, 65: 1215.
Warren, D.L. and Seifert, S.N. (2011) Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21: 335-342.
maxnet
in the maxnet package
maxent
in the dismo package
# NOT RUN {
require(raster)
### Simulated data environmental covariates
set.seed(1)
r1 <- raster(matrix(nrow=50, ncol=50, data=runif(10000, 0, 25)))
r2 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100), byrow=TRUE))
r3 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100)))
r4 <- raster(matrix(nrow=50, ncol=50, data=c(rep(1,1000),rep(2,500)),byrow=TRUE))
values(r4) <- as.factor(values(r4))
env <- stack(r1,r2,r3,r4)
### Simulate occurrence localities
nocc <- 50
x <- (rpois(nocc, 2) + abs(rnorm(nocc)))/11
y <- runif(nocc, 0, .99)
occ <- cbind(x,y)
# }
# NOT RUN {
### This call gives the results loaded below
enmeval_results <- ENMevaluate(occ, env, method="block", n.bg=500,
categoricals=4, algorithm='maxent.jar')
# }
# NOT RUN {
data(enmeval_results)
enmeval_results
### See table of evaluation metrics
enmeval_results@results
### Plot prediction with lowest AICc
plot(enmeval_results@predictions[[which (enmeval_results@results$delta.AICc == 0) ]])
points(enmeval_results@occ.pts, pch=21, bg=enmeval_results@occ.grp)
### Niche overlap statistics between model predictions
enmeval_results@overlap
# }
Run the code above in your browser using DataLab