Learn R Programming

ecospat (version 4.1.1)

ecospat.CCV.modeling: Runs indivudual species distribuion models with SDMs or ESMs

Description

Creates probabilistic prediction for all species based on SDMs or ESMs and returns their evaluation metrics and variable importances.

Usage

ecospat.CCV.modeling(sp.data, 
                     env.data, 
                     xy,
                     DataSplitTable=NULL,
                     DataSplit = 70, 
                     NbRunEval = 25,
                     minNbPredictors =5,
                     validation.method = "cross-validation",
                     models.sdm = c("GLM","RF"), 
                     models.esm = "CTA", 
                     modeling.options.sdm = NULL, 
                     modeling.options.esm = NULL, 
                     ensemble.metric = "AUC", 
                     ESM = "YES",
                     parallel = FALSE, 
                     cpus = 4,
                     VarImport = 10,
                     modeling.id)

Value

modelling.id

character, the ID (=name) of modeling procedure

output.files

vector with the names of the files written to the hard drive

speciesData.calibration

a 3-dimensional array of presence/absence data of all species for the calibration plots used for each run

speciesData.evaluation

a 3-dimensional array of presence/absence data of all species for the evaluation plots used for each run

speciesData.full

a data.frame of presence/absence data of all species (same as sp.data input)

DataSplitTable

a matrix with TRUE/FALSE for each model run (TRUE=Calibration point, FALSE=Evaluation point)

singleSpecies.ensembleEvaluationScore

a 3-dimensional array of single species evaluation metrics ('Max.KAPPA', 'Max.TSS', 'AUC of ROC')

singleSpecies.ensembleVariableImportance

a 3-dimensional array of single species variable importance for all predictors

singleSpecies.calibrationSites.ensemblePredictions

a 3-dimensional array of the predictions for each species and run at the calibration sites

singleSpecies.evaluationSites.ensemblePredictions

a 3-dimensional array of the predictions for each species and run at the evaluation sites

allSites.averagePredictions.cali

a matrix with the average predicted probabilities for each site across all the runs the sites were used for model calibration

allSites.averagePredictions.eval

a matrix with the average predicted probabilities for each site across all the runs the sites were used as independent evaluation sites

Arguments

sp.data

a data.frame where the rows are sites and the columns are species (values 1,0)

env.data

either a data.frame where rows are sites and colums are environmental variables or a SpatRaster of the envrionmental variables

xy

two column data.frame with X and Y coordinates of the sites (most be same coordinate system as env.data)

DataSplitTable

a table providing TRUE/FALSE to indicate what points are used for calibration and evaluation. As returned by ecospat.CCV.createDataSplitTable

DataSplit

percentage of dataset observations retained for the model training (only needed if no DataSplitTable provided)

NbRunEval

number of cross-validatio/split sample runs (only needed if no DataSplitTable provided)

minNbPredictors

minimum number of occurences [min(presences/Absences] per predicotors needed to calibrate the models

validation.method

either "cross-validation" or "split-sample" used to validate the communtiy predictions (only needed if no DataSplitTable provided)

models.sdm

modeling techniques used for the normal SDMs. Vector of models names choosen among 'GLM', 'GBM', 'GAM', 'CTA', 'ANN', 'SRE', 'FDA', 'MARS', 'RF', 'MAXENT' and 'MAXNET'

models.esm

modeling techniques used for the ESMs. Vector of models names choosen among 'GLM', 'GBM', 'GAM', 'CTA', 'ANN', 'SRE', 'FDA', 'MARS', 'RF', 'MAXENT' and 'MAXNET'

modeling.options.sdm

modeling options for the normal SDMs. "BIOMOD.models.options"" object returned by bm_ModelingOptions

modeling.options.esm

modeling options for the ESMs. "BIOMOD.models.options" object returned by bm_ModelingOptions

ensemble.metric

evaluation score used to weight single models to build ensembles: 'AUC', 'Kappa' or 'TSS'

ESM

either 'YES' (ESMs allowed), 'NO' (ESMs not allowed) or 'ALL' (ESMs used in any case)

parallel

should parallel computing be allowed (TRUE/FALSE)

cpus

number of cpus to use in parallel computing

VarImport

number of permutation runs to evaluate variable importance

modeling.id

character, the ID (=name) of modeling procedure. A random number by default

Author

Daniel Scherrer <daniel.j.a.scherrer@gmail.com> with the updates from Flavien Collart and Olivier Broennimann

Details

The basic idea of the community cross-validation (CCV) is to use the same data (sites) for the model calibration/evaluation of all species. This ensures that there is "independent" cross-validation/split-sample data available not only at the individual species level but also at the community level. This is key to allow an unbiased estimation of the ability to predict species assemblages (Scherrer et al. 2018). The output of the ecospat.CCV.modeling function can then be used to evaluate the species assemblage predictions with the ecospat.CCV.communityEvaluation.bin or ecospat.CCV.communityEvaluation.prob functions.

References

Scherrer, D., D'Amen, M., Mateo, M.R.G., Fernandes, R.F. & Guisan , A. (2018) How to best threshold and validate stacked species assemblages? Community optimisation might hold the answer. Methods in Ecology and Evolution, in review

See Also

ecospat.CCV.createDataSplitTable; ecospat.CCV.communityEvaluation.bin; ecospat.CCV.communityEvaluation.prob

Examples

Run this code
# \donttest{
  #Loading species occurence data and remove empty communities
  data(ecospat.testData)
  testData <- ecospat.testData[,c(24,34,43,45,48,53,55:58)]
  sp.data <- testData[which(rowSums(testData)>2), sort(colnames(testData))]
  
  #Loading environmental data
  env.data <- ecospat.testData[which(rowSums(testData)>2),4:8]
  
  #Coordinates for all sites
  xy <- ecospat.testData[which(rowSums(testData)>2),2:3]
  
  #Running all the models for all species
  myCCV.Models <- ecospat.CCV.modeling(sp.data = sp.data,
                                       env.data = env.data,
                                       xy = xy,
                                       NbRunEval = 2,
                                       minNbPredictors = 10,
                                       VarImport = 3)
                                       
  #Calculating the probabilistic community metrics
  metrics = c('SR.deviation','community.AUC','probabilistic.Sorensen','Max.Sorensen')
  myCCV.Eval.prob <- ecospat.CCV.communityEvaluation.prob(
          ccv.modeling.data = myCCV.Models, 
          community.metrics = metrics)
          
  #Thresholding all the predictions and calculating the community evaluation metrics
  myCCV.communityEvaluation.bin <- ecospat.CCV.communityEvaluation.bin(
        ccv.modeling.data = myCCV.Models, 
        thresholds = c('MAX.KAPPA', 'MAX.ROC','PS_SDM'),
        community.metrics= c('SR.deviation','Sorensen'),
        parallel = FALSE,
        cpus = 4)
        
  #removing files on disk
  unlink(list.files(pattern=myCCV.Models$modeling.id))
  unlink(myCCV.Models$modeling.id,recursive=TRUE)
          
# }

Run the code above in your browser using DataLab