Create a frequency table of a vector
or a data.frame
. It supports tidyverse's quasiquotation and markdown for reports. Easiest practice is: data %>% freq(var)
using the tidyverse.
top_freq
can be used to get the top/bottom n items of a frequency table, with counts as names. It respects ties.
freq(x, ...)# S3 method for default
freq(x, sort.count = TRUE,
nmax = getOption("max.print.freq"), na.rm = TRUE, row.names = TRUE,
markdown = !interactive(), digits = 2, quote = NULL,
header = TRUE, title = NULL, na = "", sep = " ",
decimal.mark = getOption("OutDec"), big.mark = "", ...)
# S3 method for factor
freq(x, ..., droplevels = FALSE)
# S3 method for matrix
freq(x, ..., quote = FALSE)
# S3 method for table
freq(x, ..., sep = " ")
# S3 method for numeric
freq(x, ..., digits = 2)
# S3 method for Date
freq(x, ..., format = "yyyy-mm-dd")
# S3 method for hms
freq(x, ..., format = "HH:MM:SS")
is.freq(f)
top_freq(f, n)
header(f, property = NULL)
# S3 method for freq
print(x, nmax = getOption("max.print.freq", default = 10),
markdown = !interactive(), header = TRUE,
decimal.mark = getOption("OutDec"), big.mark = ifelse(decimal.mark !=
",", ",", "."), ...)
vector of any class or a data.frame
or table
up to nine different columns of x
when x
is a data.frame
or tibble
, to calculate frequencies from - see Examples. Also supports quasiquotion.
sort on count, i.e. frequencies. This will be TRUE
at default for everything except when using grouping variables.
number of row to print. The default, 10
, uses getOption("max.print.freq")
. Use nmax = 0
, nmax = Inf
, nmax = NULL
or nmax = NA
to print all rows.
a logical value indicating whether NA
values should be removed from the frequency table. The header (if set) will always print the amount of NA
s.
a logical value indicating whether row indices should be printed as 1:nrow(x)
a logical value indicating whether the frequency table should be printed in markdown format. This will print all rows (except when nmax
is defined) and is default behaviour in non-interactive R sessions (like when knitting RMarkdown files).
how many significant digits are to be used for numeric values in the header (not for the items themselves, that depends on getOption("digits")
)
a logical value indicating whether or not strings should be printed with surrounding quotes. Default is to print them only around characters that are actually numeric values.
a logical value indicating whether an informative header should be printed
text to show above frequency table, at default to tries to coerce from the variables passed to x
a character string that should be used to show empty (NA
) values (only useful when na.rm = FALSE
)
a character string to separate the terms when selecting multiple columns
used for prettying (longish) numerical and complex sequences.
Passed to prettyNum
: that help page explains the details.
used for prettying (longish) numerical and complex sequences.
Passed to prettyNum
: that help page explains the details.
a logical value indicating whether in factors empty levels should be dropped
a character to define the printing format (it supports format_datetime
to transform e.g. "d mmmm yyyy"
to "%e %B %Y"
)
a frequency table
number of top n items to return, use -n for the bottom n items. It will include more than n
rows if there are ties.
property in header to return this value directly
A data.frame
(with an additional class "freq"
) with five columns: item
, count
, percent
, cum_count
and cum_percent
.
Interested in extending the freq()
function with your own class? Add a method like below to your package, and optionally define some header info by passing a list
to the .add_header
parameter, like below example for class difftime
. This example assumes that you use the roxygen2
package for package development.
#' @exportMethod freq.difftime #' @importFrom clean freq.default #' @export #' @noRd freq.difftime <- function(x, ...) { freq.default(x = x, ..., .add_header = list(units = attributes(x)$units)) }
Be sure to call freq.default
in your function and not just freq
. Also, add clean
to the Imports:
field of your DESCRIPTION
file, to make sure that it will be installed with your package, e.g.:
Imports: clean
Frequency tables (or frequency distributions) are summaries of the distribution of values in a sample. With the `freq` function, you can create univariate frequency tables. Multiple variables will be pasted into one variable, so it forces a univariate distribution. This package also has a vignette available to explain the use of this function further, run browseVignettes("clean")
to read it.
For numeric values of any class, these additional values will all be calculated with na.rm = TRUE
and shown into the header:
Mean, using mean
Standard Deviation, using sd
Coefficient of Variation (CV), the standard deviation divided by the mean
Mean Absolute Deviation (MAD), using mad
Tukey Five-Number Summaries (minimum, Q1, median, Q3, maximum), see NOTE below
Interquartile Range (IQR) calculated as Q3 - Q1
, see NOTE below
Coefficient of Quartile Variation (CQV, sometimes called coefficient of dispersion) calculated as (Q3 - Q1) / (Q3 + Q1)
, see NOTE below
Outliers (total count and percentage), using boxplot.stats
NOTE: These values are calculated using the same algorithm as used by Minitab and SPSS: p[k] = E[F(x[k])]. See Type 6 on the quantile
page.
For dates and times of any class, these additional values will be calculated with na.rm = TRUE
and shown into the header:
In factors, all factor levels that are not existing in the input data will be dropped at default.
The function top_freq
will include more than n
rows if there are ties. Use a negative number for n (like n = -3
) to select the bottom n values.
# NOT RUN {
# this all gives the same results:
freq(df$variable)
freq(df[, "variable"])
df$variable %>% freq()
df[, "variable"] %>% freq()
df %>% freq("variable")
df %>% freq(variable) # <- tidyverse way
# }
# NOT RUN {
clean_gender <- clean_factor(unclean$gender,
levels = c("^m" = "Male",
"^f" = "Female"))
freq(unclean$gender)
freq(clean_gender)
# }
Run the code above in your browser using DataLab