ENMevaluate: Tuning and evaluation of ENMs with Maxent

Description

ENMevaluate automatically executes Maxent (Phillips et al. 2006; Phillips and Dudik 2008) across a range of settings, returning a data.frame of evaluation metrics to aid in identifying settings that balance model fit and predictive ability. The function calls Maxent using the maxent function in the dismo package (Hijmans et al. 2011). Users should consult ENMeval-package and help documentation of the dismo package for guidelines on how to run Maxent in R.

Usage

ENMevaluate(occ, env, bg.coords = NULL, occ.grp = NULL, 
		bg.grp = NULL, RMvalues = seq(0.5, 4, 0.5), 
		fc = c("L", "LQ", "H", "LQH", "LQHP", "LQHPT"),
		categoricals = NULL, n.bg = 10000, method = NULL, 
		overlap = FALSE, aggregation.factor = c(2, 2), 
		kfolds = NA, bin.output = FALSE, clamp = TRUE)

tuning(occ, env, bg.coords, occ.grp, bg.grp, method, 
	maxent.args, args.lab, categoricals, aggregation.factor, 
	kfolds, bin.output, clamp)

Arguments

occ

Two-column matrix or data.frame of longitude and latitude (in that order) of occurrence localities.

env

RasterStack of model predictor variables (environmental layers).

bg.coords

Two-column matrix or data.frame of longitude and latitude (in that order) of background localities (required for 'user' method).

occ.grp

Vector of bins of occurrence localities (required for 'user' method).

bg.grp

Vector of bins of background localities (required for 'user' method).

RMvalues

Vector of (non-negative) values to use for the regularization multiplier.

Character vector of feature class combinations to be included in analysis.

categoricals

Vector indicating which (if any) of the input environmental layers are categorical.

n.bg

The number of random background localities to draw from the study extent.

method

Character string designating the method used for data partitioning. Choices are: "jackknife", "randomkfold", "user", "block", "checkerboard1", "checkerboard2". See details and get.evaluation

overlap

logical; If TRUE, provides pairwise metric of niche overlap (see details and calc.niche.overlap).

aggregation.factor

List giving the factor by which the original input grid should be aggregated for checkerboard partitioning methods (see details and get.evaluation.bins).

kfolds

Number of bins to use in the k-fold random method of data partitioning.

bin.output

logical; If TRUE, appends evaluations metrics for each evaluation bin to results table (i.e., in addition to the average values across bins).

maxent.args

Arguments to pass to Maxent that are generated by the make.args function

args.lab

Character labels describing feature classes and regularization multiplier values for Maxent runs provided by the make.args function.

clamp

logical; If TRUE, 'clamping' is used (see Maxent documentation and tutorial for more details).

Value

An object of class ENMevaluation with named slots:
@results data.frame of evaluation metrics. If bin.output=TRUE, evaluation metrics calculated separately for each evaluation bin are included in addition to the averages across k bins.
@predictions RasterStack of full model predictions with each layer named as: fc_RM (e.g., L_1).
@partition.method character vector with the method used for data partitioning.
@occ.pts data.frame of the latitude/longitude of input occurrence localities.
@occ.grp vector identifying the bin for each occurrence locality.
@bg.pts data.frame of the latitude/longitude of input background localities.
@bg.grp vector identifying the bin for each background locality.
@overlap matrix of pairwise niche overlap (blank if overlap = FALSE).

Details

ENMevaluate is the primary function for general use in the ENMeval package; the tuning function is used internally.

Maxent settings: In the current default implementation of Maxent, the combination of feature classes (fcs) allowed depends on the number of occurrence localities, and the value for the regularization multiplier (RM) is 1.0. ENMevaluate provides an automated way to execute ecological niche models in Maxent across a user-specified range of (RM) values and (fc) combinations, regardless of sample size. Acceptable values for the fc argument include: L=linear, Q=quadratic, P=product, T=threshold, and H=hinge (see Maxent help documentation, Phillips et al. (2006), Phillips and Dudik (2008), Elith et al. (2011), and Merow et al. (2013) for additional details on RM and fcs). Categorical feature classes (C) are specified by the categoricals argument.

Methods for partitioning data: ENMevaluate includes six methods to partition occurrence and background localities into bins for training and testing ('jackknife', 'randomkfold', 'user', 'block', 'checkerboard1', 'checkerboard2'). The jackknife method is a special case of k-fold cross validation where the number of folds (k) is equal to the number of occurrence localities (n) in the dataset. The randomkfold method partitions occurrence localities randomly into a user-specified number of (k) bins - this is equivalent to the method of k-fold cross validation currently provided by Maxent. The user method enables users to define bins a priori. For this method, the user is required to provide background coordinates (bg.coords) and bin designations for both occurrence localities (occ.grp) and background localities (bg.grp). The block method partitions the data into four bins according to the lines of latitude and longitude that divide the occurrence localities into bins of as equal number as possible. The checkerboard1 (and checkerboard2) methods partition data into two (or four) bins based on one (or two) checkerboard patterns with grain size defined as one (or two) aggregation factor(s) of the original environmental layers. Although the checkerboard1 (and checkerboard2) methods are designed to partition occurrence localities into two (and four) evaluation bins, they may give fewer bins depending on the location of occurrence localities with respect to the checkerboard grid(s) (e.g., all records happen to fall in the "black" squares). A warning is given if the number of bins is < 4 for the checkerboard2 method, and an error is given if all localities fall in a single evaluation bin. Additional details can be found in get.evaluation.bins.

Evaluation metrics: Four evaluation metrics are calculated using the partitioned dataset, and one additional metric is provided based on the full dataset. ENMevaluate uses the same background localities and evaluation bin designations for each of the k iterations (for each unique combination of RM and fc) to facilitate valid comparisons among model settings.

Mean.AUC is the area under the curve of the receiver operating characteristic plot made based on the testing data (i.e., AUCtest), averaged across k bins. In each iteration, as currently implemented, the AUCtest value is calculated with respect to the full set of background localities to enable comparisons across the k iterations (Radosavljevic and Anderson 2014). As a relative measure for a given study species and region, high values of Mean.AUC are associated with the degree to which a model can successfully discriminate occurrence from background localities. This rank-based non-parametric metric, however, does not reveal the model goodness-of-fit (Lobo et al. 2008; Peterson et al. 2011).

To quantify the degree of overfitting, ENMevaluate calculates three metrics. The first is the difference between training and testing AUC, averaged across k bins (Mean.AUC.DIFF) (Warren and Seifert 2011). Mean.AUC.DIFF is expected to be high for models overfit to the training data. ENMevaluate also calculates two threshold-dependent omission rates that quantify overfitting when compared with the omission rate expected by the threshold employed: the proportion of testing localities with Maxent output values lower than the value associated with (1) the training locality with the lowest value (i.e., the minimum training presence, MTP; = 0 percent training omission) (Mean.ORmin) and (2) the value that excludes the 10 percent of training localities with the lowest predicted suitability (Mean.OR10) (Pearson et al. 2007). ENMevaluate uses corrected.var to calculate the variance for each of these metrics across k bins (i.e., variances are corrected for non-independence of cross-validation iterations; see Shcheglovitova and Anderson 2013). The value of these metrics for each of the individual k bins is returned if bin.output = TRUE.

Based on the unpartitioned (full) dataset, ENMevaluate uses calc.aicc to calculate the AICc value for each model run and provides delta.AIC, AICc weights, as well as the number of parameters for each model (Warren and Seifert 2011). The AUCtrain value for the full model is also returned (full.AUC).

To quantify how resulting predictions differ in geographic space depending on the settings used, ENMevaluate includes an option to compute pairwise niche overlap between all pairs of full models (i.e., using the unpartitioned dataset) with Schoener's D statistic (Schoener 1968; Warren et al. 2009).

References

Elith, J., Phillips, S. J., Hastie, T., Dudik, M., Chee, Y. E., and Yates, C. J. (2011) A statistical explanation of MaxEnt for ecologists. Diversity and Distributions, 17: 43-57.

Hijmans, R. J., Phillips, S., Leathwick, J. and Elith, J. (2011) dismo package for R. Available online at: http://cran.r-project.org/web/packages/dismo/index.html.

Lobo, J. M., Jimenez-Valverde, A., and Real, R. (2008) AUC: A misleading measure of the performance of predictive distribution models. Global Ecology and Biogeography, 17: 145-151.

Pearson, R. G., Raxworthy, C. J., Nakamura, M. and Peterson, A. T. 2007. Predicting species distributions from small numbers of occurrence records: a test case using cryptic geckos in Madagascar. Journal of Biogeography, 34: 102-117.

Peterson, A. T., Soberon, J., Pearson, R. G., Anderson, R. P., Martinez-Meyer, E., Nakamura, M. and Araujo, M. B. (2011) Ecological Niches and Geographic Distributions. Monographs in Population Biology, 49. Princeton University Press, Princeton, NJ.

Phillips, S. J., Anderson, R. P., and Schapire, R. E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190: 231-259.

Phillips, S. J. and Dudik, M. (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31: 161-175.

Merow, C., Smith, M., and Silander, J. A. (2013) A practical guide to Maxent: what it does, and why inputs and settings matter. Ecography, 36: 1-12.

Radosavljevic, A. and Anderson, R. P. 2014. Making better Maxent models of species distributions: complexity, overfitting and evaluation. Journal of Biogeography, 41: 629-643.

Schoener, T. W. (1968) The Anolis lizards of Bimini: resource partitioning in a complex fauna. Ecology, 49: 704-726.

Shcheglovitova, M. and Anderson, R. P. (2013) Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes. Ecological Modelling, 269: 9-17. Warren, D. L., Glor, R. E., Turelli, M. and Funk, D. (2009) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution, 62: 2868-2883; Erratum: Evolution, 65: 1215.

Warren, D.L. and Seifert, S.N. (2011) Ecological niche modeling in Maxent: the importance of model complexity and the performance of model selection criteria. Ecological Applications, 21: 335-342.

Examples

Run this code

### Simulated data environmental covariates
set.seed(1)
r1 <- raster(matrix(nrow=50, ncol=50, data=runif(10000, 0, 25)))
r2 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100), byrow=TRUE))
r3 <- raster(matrix(nrow=50, ncol=50, data=rep(1:100, each=100)))
r4 <- raster(matrix(nrow=50, ncol=50, data=c(rep(1,1000),rep(2,500)),byrow=TRUE))
values(r4) <- as.factor(values(r4))
env <- stack(r1,r2,r3,r4)

### Simulate occurrence localities
nocc <- 50
x <- (rpois(nocc, 2) + abs(rnorm(nocc)))/11
y <- runif(nocc, 0, .99)
occ <- cbind(x,y)

### This call gives the results loaded below
enmeval_results <- ENMevaluate(occ, env, method="block", n.bg=500, overlap=TRUE, 
bin.output=TRUE, clamp=TRUE)

data(enmeval_results)
enmeval_results

### See table of evaluation metrics
enmeval_results@results

### Plot prediction with lowest AICc
plot(enmeval_results@predictions[[which (enmeval_results@results$delta.AICc == 0) ]])
points(enmeval_results@occ.pts, pch=21, bg=enmeval_results@occ.grp)

### Niche overlap statistics between model predictions
enmeval_results@overlap

Run the code above in your browser using DataLab