Learn R Programming

ptools (version 2.0.0)

pai: Predictive Accuracy Index

Description

Given a set of predictions and observed counts, returns the PAI (predictive accuracy index), PEI (predictive efficiency index), and the RRI (recovery rate index)

Usage

pai(dat, count, pred, area, other = c())

Value

A dataframe with the columns:

  • Order, The order of the resulting rankings

  • Count, the counts for the original crimes you specified

  • Pred, the original predictions

  • Area, the area for the units of analysis

  • Cum*, the cumulative totals for Count/Pred/Area

  • PCum*, the proportion cumulative totals, e.g. CumCount/sum(Count)

  • PAI, the PAI stat

  • PEI, the PEI stat

  • RRI, the RRI stat (probably should analyze/graph the log(RRI))!

Plus any additional variables specified by other at the end of the dataframe.

Arguments

dat

data frame with the predictions, observed counts, and area sizes (can be a vector of ones)

count

character specifying the column name for the observed counts (e.g. the out of sample crime counts)

pred

character specifying the column name for the predicted counts (e.g. predictions based on a model)

area

character specifying the column name for the area sizes (could also be street segment distances, see Drawve & Wooditch, 2019)

other

vector of strings for any other column name you want to keep (e.g. an ID variable), defaults to empty c()

Details

Given predictions over an entire sample, this returns a dataframe with the sorted best PAI (sorted by density of predicted counts per area). PAI is defined as:

$$PAI = \frac{c_t/C}{a_t/A}$$

Where the numerator is the percent of crimes in cumulative t areas, and the denominator is the percent of the area encompassed. PEI is the observed PAI divided by the best possible PAI if you were a perfect oracle, so is scaled between 0 and 1. RRI is predicted/observed, so if you have very bad predictions can return Inf or undefined! See Wheeler & Steenbeek (2019) for the definitions of the different metrics. User note, PEI may behave funny with different sized areas.

References

Drawve, G., & Wooditch, A. (2019). A research note on the methodological and theoretical considerations for assessing crime forecasting accuracy with the predictive accuracy index. Journal of Criminal Justice, 64, 101625.

Wheeler, A. P., & Steenbeek, W. (2021). Mapping the risk terrain for crime using machine learning. Journal of Quantitative Criminology, 37(2), 445-480.

See Also

pai_summary() for a summary table of metrics for multiple pai tables given fixed N thresholds

Examples

Run this code

# Making some very simple fake data
crime_dat <- data.frame(id=1:6,
                        obs=c(6,7,3,2,1,0),
                        pred=c(8,4,4,2,1,0))
crime_dat$const <- 1
p1 <- pai(crime_dat,'obs','pred','const')
print(p1)

# Combining multiple predictions, making
# A nice table
crime_dat$rand <- sample(crime_dat$obs,nrow(crime_dat),FALSE)
p2 <- pai(crime_dat,'obs','rand','const')
pai_summary(list(p1,p2),c(1,3,5),c('one','two'))

Run the code above in your browser using DataLab