calibration.plot
produces a goodness-of-fit plot for Presence/Absence data.
calibration.plot(DATA, which.model = 1, na.rm = FALSE, alpha = 0.05, N.bins = 5,
xlab = "Predicted Probability of Occurrence",
ylab = "Observed Occurrence as Proportion of Sites Surveyed",
main = NULL, color= NULL, model.names= NULL)
creates a graphical plot
returns a dataframe of information about the bins where:
[,1] | BinCenter | center of bin |
[,2] | NBin | Number of plots in Bin |
[,3] | BinObs | Proportion of Bin observed as Present |
[,4] | BinPred | Average prediction for Bin |
[,5] | BinObsCIlower | Lower bound of confidence Interval for BinObs |
[,6] | BinObsCIupper | Upper bound of confidence Interval for BinObs |
a matrix or dataframe of observed and predicted values where each row represents one plot and where columns are:
DATA[,1] | plot ID | text | |||
DATA[,2] | observed values | zero-one values | |||
DATA[,3] | predicted probabilities from first model | numeric (between 0 and 1) | |||
DATA[,4] | predicted probabilities from second model, etc... |
a number indicating which model from DATA
should be used
a logical indicating whether missing values should be removed
alpha value for confidence intervals
number of bins to split predicted probabilities into
a title for the x axis
a title for the y axis
an overall title for the plot
a logical or a vector of color codes
a vector of the names of each model included in DATA
Elizabeth Freeman eafreeman@fs.fed.us
Takes a single model and creates a goodness-of-fit plot of observed verses predicted values. The plots are grouped into bins based on their predicted values, and then the bin prevalence (the ratio of plots in this bin with observed values of present verses the total number of plots in this bin) is calculated for each bin. The confidence interval for each bin is also plotted, and the total number of plots is labeled above each the bin.
Confidence intervals are calculated for the binomial bin counts using the F distribution.
Unlike a typical goodness-of-fit plot from a linear regression model, with Presence/Absence data having all the points lay along the diagonal does not necessarily imply a good quality model. The ideal calibration plot for Presence/Absence data depends on the intended use of the model.
If the model is to be used to produce probability maps, then it is indeed desirable that (for example) 80 percent of plots with predicted probability of 0.8 actually do have observed Presence. In this case, having all the bins along the diagonal does indicate a good model.
However, if model is to be used simply to predict species presence, then all that is required is that some threshold exists (not necessarily 0.5) where every plot with a lower predicted probability is observed Absent, and every plot with a higher predicted probability is observed Present. In this case, a good model will not necessarily (in fact, will rarely) have all the bins along the diagonal. (Note: for this purpose presence.absence.hist
may produce more useful diagnostics.)
If all the bins lie above the diagonal, or all the bins lie below the diagonal, it may indicate that the training and test datasets have different prevalence. In this case, it may be worthwhile to re-examine the initial data selection.
Vaughan, I. P., Ormerod, S. J. 2005. The continuing challenges of testing species distribution models. J. Appl. Ecol., 42:720-730.
Reineking, B. and Schröder, B. 2006. Constrain to perform: regularization of habitat models. Ecological Modelling 193: 675-690.
presence.absence.summary, presence.absence.hist
data(SIM3DATA)
calibration.plot(SIM3DATA)
calibration.plot( DATA=SIM3DATA,
which.model=3,
na.rm=TRUE,
alpha=0.05,
N.bins=10,
xlab="Predicted Probability of Occurence",
ylab="Observed occurence as proportion of sites surveyed",
model.names=NULL,
main=NULL)
Run the code above in your browser using DataLab