Learn R Programming

benford.analysis (version 0.1.5)

benford.analysis: Benford Analysis for data validation and forensic analytics

Description

The Benford Analysis package provides tools that make it easier to validate data using Benford's Law. The main purpose of the package is to identify suspicious data that need further verification.

Arguments

Details

More information can be found on its help documentation.

The main function is benford. It generates a Benford S3 object.

The package defines S3 methods for plotting and printing Benford type objects.

After running benford you can easily get the "suspicious" data by using the functions: suspectsTable, getSuspects, duplicatesTable and getDuplicates. See help documentation and examples for further details.

The package also includes 6 real datasets for illustration purposes.

References

Alexander, J. (2009). Remarks on the use of Benford's Law. Working Paper, Case Western Reserve University, Department of Mathematics and Cognitive Science. Berger, A. and Hill, T. (2011). A basic theory of Benford's Law. Probability Surveys, 8, 1-126. Hill, T. (1995). A statistical derivation of the significant-digit law. Statistical Science, 10(4), 354-363. Nigrini, M. J. (2012). Benford's Law: Application for Forensic Accounting, Auditing and Fraud Detection. Wiley and Sons: New Jersey. Nigrini, M. J. (2011). Forensic Analyticis: Methods and Techniques for Forensic Accounting Investigations.Wiley and Sons: New Jersey.

Examples

Run this code
# NOT RUN {
data(corporate.payment) #gets data
cp <- benford(corporate.payment$Amount, 2, sign="both") #generates benford object
cp #prints 
plot(cp) #plots

head(suspectsTable(cp),10) #prints the digits by decreasing order of discrepancies

#gets observations of the 2 most suspicious groups
suspects <- getSuspects(cp, corporate.payment, how.many=2) 

duplicatesTable(cp) #prints the duplicates by decreasing order

#gets the observations of the 2 values with most duplicates
duplicates <- getDuplicates(cp, corporate.payment,how.many=2) 

MAD(cp) #gets the Mean Absolute Deviation

chisq(cp) #gets the Chi-squared test

#gets observations starting with 50 or 99
digits_50_and_99 <- getDigits(cp, corporate.payment, digits=c(50, 99)) 

# }

Run the code above in your browser using DataLab