Learn R Programming

ptools (version 2.0.0)

check_pois: Checks the fit of a Poisson Distribution

Description

Provides a frequency table to check the fit of a Poisson distribution to empirical data.

Usage

check_pois(counts, min_val, max_val, pred, silent = FALSE)

Value

A dataframe with columns

  • Int, the integer value

  • Freq, the total observed counts within that Integer value

  • PoisF, the expected counts according to a Poisson distribution with mean/pred specified

  • ResidF, the residual from Freq - PoisF

  • Prop, the observed proportion of that integer (0-100 scale)

  • PoisD, the expected proportion of that integer (0-100 scale)

  • ResidD, the residual from Prop - PoisD

Arguments

counts

vector of counts, e.g. c(0,5,1,3,4,6)

min_val

scaler minimum value to generate the grid of results, e.g. 0

max_val

scaler maximum value to generate the grid of results, e.g. max(counts)

pred

can either be a scaler, e.g. mean(counts), or a vector (e.g. predicted values from a Poisson regression)

silent

boolean, do not print mean/var stat messages, only applies when passing scaler for pred (default FALSE)

Details

Given either a scaler mean to test the fit, or a set of predictions (e.g. varying means predicted from a model), checks whether the data fits a given Poisson distribution over a specified set of integers. That is it builds a table of integer counts, and calculates the observed vs the expected distribution according to Poisson. Useful for checking any obvious deviations.

Examples

Run this code
# Example use for constant over the whole sample
set.seed(10)
lambda <- 0.2
x <- rpois(10000,lambda)
pfit <- check_pois(x,0,max(x),mean(x))
print(pfit)
# 82% zeroes is not zero inflated -- expected according to Poisson!

# Example use if you have varying predictions, eg after Poisson regression
n <- 10000
ru <- runif(n,0,10)
x <- rpois(n,lambda=ru)
check_pois(x, 0, 23, ru)

# If you really want to do a statistical test of fit
chi_stat <- sum((pfit$Freq - pfit$PoisF)^2/pfit$PoisF)
df <- length(pfit$Freq) - 2
stats::dchisq(chi_stat, df) #p-value
# I prefer evaluating specific integers though (e.g. zero-inflated, longer-tails, etc.)

Run the code above in your browser using DataLab