cmx: Confusion Matrix

Description

cmx calculates the confusion matrix for a single model.

Usage

cmx(DATA, threshold = 0.5, which.model = 1, na.rm = FALSE)

Arguments

DATA

a matrix or dataframe of observed and predicted values where each row represents one plot and where columns are:

DATA[,1]	plot ID	text
DATA[,2]	observed values	zero-one values
DATA[,3]	predicted probabilities from first model	numeric (between 0 and 1)

threshold

a cutoff value between zero and one used for translating predicted probabilities into 0 /1 values, defaults to 0.5. It must be a single value between zero and one.

which.model

a number indicating which model from DATA should be used

na.rm

a logical indicating whether missing values should be removed

Value

the confusion matrix is returned in the form of a table where:

columns

observed values

rows

predicted values

Details

cmx calculates the confusion matrix for a single model at a single threshold.

If DATA contains more predictions from more than one model WHICH.DATA can be used to specify which model should be used. If WHICH.DATA is not given, cmx will use predictions from the first model by default.

When calculating the confusion matrix, any plot with a predicted probability greater than threshold is considered to be predicted Present, while any plot with a predicted probability less than or equal to threshold is considered to be predicted Absent. The only exception is when threshold equals zero. In that case, all plots are considered to be predicted Present.

Unlike other functions in this library, threshold can not be a vector or an integer greater than one. Instead, threshold must be given as a single number between zero and one.

If na.rm equals FALSE and NA's are present in the DATA function will return NA.

If na.rm equals TRUE and NA's are present in the DATA, function will remove all rows where any of the values in the row consist of NA. Function will also print the number of rows that have been removed.

Examples

Run this code

# NOT RUN {
### EXAMPLE 1 ###
     ### generate simulated data ###
     set.seed(666)
     N=1000
     SIMDATA<-matrix(0,N,3)
     SIMDATA<-as.data.frame(SIMDATA)
     names(SIMDATA)<-c("plotID","Observed","Predicted")
     SIMDATA$plotID<-1:N
     SIMDATA$Observed<-rbinom(n=N,size=1,prob=.2)
     SIMDATA$Predicted[SIMDATA$Observed==1]<-rnorm(n=length(SIMDATA$Observed[SIMDATA$Observed==1]),mean=.8,sd=.15)
     SIMDATA$Predicted[SIMDATA$Observed==0]<-rnorm(n=length(SIMDATA$Observed[SIMDATA$Observed==0]),mean=.2,sd=.15)
     SIMDATA$Predicted<-(SIMDATA$Predicted-min(SIMDATA$Predicted))/(max(SIMDATA$Predicted)-min(SIMDATA$Predicted))

     ### plot simulated data
     hist(SIMDATA$Predicted,100)

     ### calculate confusion matrix ###
     cmx(SIMDATA)

### EXAMPLE 2 ###

     data(SIM3DATA)

     cmx(SIM3DATA)
     cmx(SIM3DATA,which.model=2)
     cmx(SIM3DATA,which.model=3,threshold=.2)

# }

Run the code above in your browser using DataLab