Learn R Programming

ptools (version 2.0.0)

small_samptest: Small Sample Exact Test for Counts in Bins

Description

Small sample test statistic for counts of N items in bins with particular probability.

Usage

small_samptest(d, p = rep(1/length(d), length(d)), type = "G", cdf = FALSE)

Value

A small_sampletest object with slots for:

  • CDF, a dataframe that contains the exact probabilities and test statistic values for every possible permutation

  • probabilities, the null probabilities you specified

  • data, the observed counts you specified

  • test, the type of test conducted (e.g. G, KS, Chi, etc.)

  • test_stat, the test statistic for the observed data

  • p_value, the p-value for the observed stat based on the exact null distribution

  • AggregateStatistics, here is a reduced form aggregate table for the CDF/p-value calculation

If you wish to save the object, you may want to get rid of the CDF part, it can be quite large. It will have a total of choose(n+n-1,m-1) total rows, where m is the number of bins and n is the total counts. So if you have 10 crimes in 7 days of the week, it will result in a dataframe with choose(7 + 10 - 1,7-1), which is 8008 rows. Currently I keep the CDF part though to make it easier to calculate power for a particular test

Arguments

d

vector of counts, e.g. c(0,2,1,3,1,4,0) for counts of crimes in days of the week

p

vector of baseline probabilities, defaults to equal probabilities in each bin

type

string specifying "G" for likelihhood ratio G stat (the default), "V" for Kuipers test (for circular data), "KS" for Komolgrov-Smirnov test, and "Chi" for Chi-square test

cdf

if FALSE (the default) generates a new permutation vector (using exactProb), else pass it a final probability dataset previously created

Details

This construct a null distribution for small sample statistics for N counts in M bins. Example use cases are to see if a repeat offender have a proclivity to commit crimes on a particular day of the week (see the referenced paper). It can also be used for Benford's analysis of leading/trailing digits for small samples. Referenced paper shows G test tends to have the most power, although with circular data may consider Kuiper's test.

References

Nigrini, M. J. (2012). Benford's Law: Applications for forensic accounting, auditing, and fraud detection. John Wiley & Sons.

Wheeler, A. P. (2016). Testing Serial Crime Events for Randomness in Day-of-Week Patterns with Small Samples. Journal of Investigative Psychology and Offender Profiling, 13(2), 148-165.

See Also

powalt() for calculating power of a test under alternative

Examples

Run this code
# Counts for different days of the week
d <- c(3,1,1,0,0,1,1) #format N observations in M bins
res <- small_samptest(d=d,type="G")
print(res)

# Example for Benfords analysis
f <- 1:9
p_fd <- log10(1 + (1/f)) #first digit probabilities
#check data from Nigrini page 84
checks <- c(1927.48,27902.31,86241.90,72117.46,81321.75,97473.96,
           93249.11,89658.17,87776.89,92105.83,79949.16,87602.93,
           96879.27,91806.47,84991.67,90831.83,93766.67,88338.72,
           94639.49,83709.28,96412.21,88432.86,71552.16)
# To make example run a bit faster
c1 <- checks[1:10]
#extracting the first digits
fd <- substr(format(c1,trim=TRUE),1,1)
tot <- table(factor(fd, levels=paste(f)))
resG <- small_samptest(d=tot,p=p_fd,type="Chi")
resG

#Can reuse the cdf table if you have the same number of observations
c2 <- checks[11:20]
fd2 <- substr(format(c2,trim=TRUE),1,1)
t2 <- table(factor(fd2, levels=paste(f)))
resG2 <- small_samptest(d=t2,p=p_fd,type="Chi",cdf=resG$CDF)

Run the code above in your browser using DataLab