Learn R Programming

PresenceAbsence (version 1.1.9)

presence.absence.simulation: Presence/Absence Data Simulation

Description

presence.absence.simulation simulates presence/absence data as one set of observed values, and one or more prediction models. First, Observed values are generated as a binomial distribution, then for each model two beta distributions are used to generate predicted values, one beta distribution for the data points where the simulated observed value is present, and a second for points where it is absent.

Usage

presence.absence.simulation(n, prevalence, N.models = 1, shape1.absent, shape2.absent, shape1.present, shape2.present)

Arguments

n

number of plots (i.e. rows) in simulated dataset

prevalence

probability species is present for binomial observed values

N.models

number of models to simulate predictions for

shape1.absent

first parameter for beta distribution for plots where observed value is absent

shape2.absent

second parameter for beta distribution for plots where observed value is absent

shape1.present

first parameter for beta distribution for plots where observed value is present

shape2.present

second parameter for beta distribution for plots where observed value is present

Value

presence.absence.simulation returns a dataframe where:

column 1

plotID - plot ID numbers

column 2

Observed - 0/1 values

column 3

Predicted 1 - predicted probabilities for model 1

column 4

Predicted 2 - predicted probabilities for model 2, etc...

Details

presence.absence.simulation will generate predicted probabilities for one or more models. If N.models = 1, then shape parameters should be of length 1. If N.models > 1, then shape parameters can be either length 1 or vectors of length N.models.

The beta distribution is extremely flexible and is capable of generating data with unrealistic behavior. The following rules of thumb will help generate realistic datasets:

The mean of the beta distribution equals shape1/(shape1+shape2). To get reasonable predictions (e.g. better than random), the mean for the plots where the observed value is present should be higher than that for the plots where the species is absent:

mean(present) > mean(absent)

The overall mean probability should be approximately equal to the prevalence. In other words:

prevalence*mean(present) + (1-prevalence)*mean(absent) = prevalence

Examples

Run this code
# NOT RUN {
### EXAMPLE 1 ###
### a graph illustrating effect of shape parameters on beta distribution

set.seed(666)
shapes<-c(1,2,5,10,20)
par(mfrow=c(5,5),mar=c(2,2,2,2),oma=c(0,3,3,0))

for(i in 1:5){
for(j in 1:5){
     SIMDATA<-presence.absence.simulation( n=1000,
                                           prevalence=1,
                                           N.models=1,
                                           shape1.absent=1,
                                           shape2.absent=1,
                                           shape1.present=shapes[i],
                                           shape2.present=shapes[j])
	#Note: by setting prevalence=1, all observed values will be 'present' 
	#	 therefore only one beta distribution will be simulated.	
	hist(SIMDATA[,3],breaks=50,main="",xlab="",ylab="",xlim=c(0,1))
	if(i==1){mtext(paste("shape2 =",shapes[j]),side=3,line=2,cex=.8)}
	if(j==1){mtext(paste("shape1 =",shapes[i]),side=2,line=3,cex=.8)}
}}

### EXAMPLE 2 ###
### generate observed data along with 3 sets of model predictions 
### for models of varying predictive ability.
### Note: This is the code used to generate sample dataset SIM3DATA.

set.seed(666)
SIM3DATA<-presence.absence.simulation(	n=1000,
							prevalence=.2,
							N.models=3,
							shape1.absent=c(1,1,1),
							shape2.absent=c(14,7,5), 
							shape1.present=c(6,2,1),
							shape2.present=c(2,2,2))
# }

Run the code above in your browser using DataLab