calibration.plot: Calibration Plot

Description

calibration.plot produces a goodness-of-fit plot for Presence/Absence data.

Usage

calibration.plot(DATA, which.model = 1, na.rm = FALSE, alpha = 0.05, N.bins = 5, 
xlab = "Predicted Probability of Occurrence", 
ylab = "Observed Occurrence as Proportion of Sites Surveyed", 
main = NULL, color= NULL, model.names= NULL)

Value

creates a graphical plot

returns a dataframe of information about the bins where:

`[,1]`	`BinCenter`	center of bin
`[,2]`	`NBin`	Number of plots in Bin
`[,3]`	`BinObs`	Proportion of Bin observed as Present
`[,4]`	`BinPred`	Average prediction for Bin
`[,5]`	`BinObsCIlower`	Lower bound of confidence Interval for `BinObs`
`[,6]`	`BinObsCIupper`	Upper bound of confidence Interval for `BinObs`

Arguments

DATA

a matrix or dataframe of observed and predicted values where each row represents one plot and where columns are:

`DATA[,1]`	plot ID	text
`DATA[,2]`	observed values	zero-one values
`DATA[,3]`	predicted probabilities from first model	numeric (between 0 and 1)
`DATA[,4]`	predicted probabilities from second model, etc...

which.model

a number indicating which model from DATA should be used

na.rm

a logical indicating whether missing values should be removed

alpha

alpha value for confidence intervals

N.bins

number of bins to split predicted probabilities into

xlab

a title for the x axis

ylab

a title for the y axis

main

an overall title for the plot

color

a logical or a vector of color codes

model.names

a vector of the names of each model included in DATA

Author

Elizabeth Freeman eafreeman@fs.fed.us

Details

Takes a single model and creates a goodness-of-fit plot of observed verses predicted values. The plots are grouped into bins based on their predicted values, and then the bin prevalence (the ratio of plots in this bin with observed values of present verses the total number of plots in this bin) is calculated for each bin. The confidence interval for each bin is also plotted, and the total number of plots is labeled above each the bin.

Confidence intervals are calculated for the binomial bin counts using the F distribution.

Unlike a typical goodness-of-fit plot from a linear regression model, with Presence/Absence data having all the points lay along the diagonal does not necessarily imply a good quality model. The ideal calibration plot for Presence/Absence data depends on the intended use of the model.

If the model is to be used to produce probability maps, then it is indeed desirable that (for example) 80 percent of plots with predicted probability of 0.8 actually do have observed Presence. In this case, having all the bins along the diagonal does indicate a good model.

However, if model is to be used simply to predict species presence, then all that is required is that some threshold exists (not necessarily 0.5) where every plot with a lower predicted probability is observed Absent, and every plot with a higher predicted probability is observed Present. In this case, a good model will not necessarily (in fact, will rarely) have all the bins along the diagonal. (Note: for this purpose presence.absence.hist may produce more useful diagnostics.)

If all the bins lie above the diagonal, or all the bins lie below the diagonal, it may indicate that the training and test datasets have different prevalence. In this case, it may be worthwhile to re-examine the initial data selection.

References

Vaughan, I. P., Ormerod, S. J. 2005. The continuing challenges of testing species distribution models. J. Appl. Ecol., 42:720-730.

Reineking, B. and Schröder, B. 2006. Constrain to perform: regularization of habitat models. Ecological Modelling 193: 675-690.

Examples

Run this code

data(SIM3DATA)

calibration.plot(SIM3DATA)

calibration.plot(	DATA=SIM3DATA,
			which.model=3,
			na.rm=TRUE,
			alpha=0.05,
			N.bins=10,
			xlab="Predicted Probability of Occurence",
			ylab="Observed occurence as proportion of sites surveyed",
			model.names=NULL,
			main=NULL)

Run the code above in your browser using DataLab