proc_freq: View and return the frequency distribution of a variable.
Description
For continuous variables, the user can optionally specify to discretize the
variable into a fixed number of equal width bins, or into custom bins of the
user's choice. This is useful for larger datasets with many unique
observed values
Usage
proc_freq(dat, var, bins = 0)
Arguments
dat
a tbl
var
character string giving the name of the desired variable, or
a single number giving the position of the desired variable
bins
if 0, no discretization is performed. If a positive integer then
var is binned into bins equal width ranges, and the
frequency distribution of those ranges is computed. If a length > 1
numeric vector, then var is binned into ranges with cutpoints
defined by the unique entries of bins
Value
a tbl containing 3 columns: level gives the unique values or
bins, count gives the count in each level of level and
percent gives the percentage of total observations in each
level. proc_freq also automatically sends the frequency
distribution to the viewer, using utils::View
Details
R has many one-line solutions to getting the frequency distribution of a
variable; this function provides a unified approach that makes use of the
efficient data types and computation provided by the dplyr package,
and as a bonus, makes it easy to explore the distribution of a continuous
variable with many unique observations by automating discretization. The name
is intended to make the function more portable for SAS users who are not
comfortable outside their native habitat.