tbl_svysummary: Create a table of summary statistics from a survey object

Description

The tbl_svysummary function calculates descriptive statistics for continuous, categorical, and dichotomous variables taking into account survey weights and design. It is similar to tbl_summary().

Usage

tbl_svysummary(
  data,
  by = NULL,
  label = NULL,
  statistic = NULL,
  digits = NULL,
  type = NULL,
  value = NULL,
  missing = NULL,
  missing_text = NULL,
  sort = NULL,
  percent = NULL,
  include = everything()
)

Value

A tbl_svysummary object

statistic argument

The statistic argument specifies the statistics presented in the table. The input is a list of formulas that specify the statistics to report. For example, statistic = list(age ~ "{mean} ({sd})") would report the mean and standard deviation for age; statistic = list(all_continuous() ~ "{mean} ({sd})") would report the mean and standard deviation for all continuous variables. A statistic name that appears between curly brackets will be replaced with the numeric statistic (see glue::glue).

For categorical variables the following statistics are available to display.

{n} frequency
{N} denominator, or cohort size
{p} percentage
{p.std.error} standard error of the sample proportion computed with survey::svymean()
{n_unweighted} unweighted frequency
{N_unweighted} unweighted denominator
{p_unweighted} unweighted formatted percentage

For continuous variables the following statistics are available to display.

{median} median
{mean} mean
{sd} standard deviation
{var} variance
{min} minimum
{max} maximum
{p##} any integer percentile, where ## is an integer from 0 to 100
{sum} sum

Unlike tbl_summary(), it is not possible to pass a custom function.

For both categorical and continuous variables, statistics on the number of missing and non-missing observations and their proportions are available to display.

{N_obs} total number of observations
{N_miss} number of missing observations
{N_nonmiss} number of non-missing observations
{p_miss} percentage of observations missing
{p_nonmiss} percentage of observations not missing
{N_obs_unweighted} unweighted total number of observations
{N_miss_unweighted} unweighted number of missing observations
{N_nonmiss_unweighted} unweighted number of non-missing observations
{p_miss_unweighted} unweighted percentage of observations missing
{p_nonmiss_unweighted} unweighted percentage of observations not missing

Note that for categorical variables, {N_obs}, {N_miss} and {N_nonmiss} refer to the total number, number missing and number non missing observations in the denominator, not at each level of the categorical variable.

Example Output

Example 1

Example 2

type argument

The tbl_summary() function has four summary types:

"continuous" summaries are shown on a single row. Most numeric variables default to summary type continuous.
"continuous2" summaries are shown on 2 or more rows
"categorical" multi-line summaries of nominal data. Character variables, factor variables, and numeric variables with fewer than 10 unique levels default to type categorical. To change a numeric variable to continuous that defaulted to categorical, use type = list(varname ~ "continuous")
"dichotomous" categorical variables that are displayed on a single row, rather than one row per level of the variable. Variables coded as TRUE/FALSE, 0/1, or yes/no are assumed to be dichotomous, and the TRUE, 1, and yes rows are displayed. Otherwise, the value to display must be specified in the value argument, e.g. value = list(varname ~ "level to show")

select helpers

Select helpers from the \tidyselect\ package and \gtsummary\ package are available to modify default behavior for groups of variables. For example, by default continuous variables are reported with the median and IQR. To change all continuous variables to mean and standard deviation use statistic = list(all_continuous() ~ "{mean} ({sd})").

All columns with class logical are displayed as dichotomous variables showing the proportion of events that are TRUE on a single row. To show both rows (i.e. a row for TRUE and a row for FALSE) use type = list(where(is.logical) ~ "categorical").

The select helpers are available for use in any argument that accepts a list of formulas (e.g. statistic, type, digits, value, sort, etc.)

Read more on the syntax used through the package.

Examples

Run this code

# NOT RUN {
# A simple weighted dataset
tbl_svysummary_ex1 <-
  survey::svydesign(~1, data = as.data.frame(Titanic), weights = ~Freq) %>%
  tbl_svysummary(by = Survived, percent = "row", include = c(Class, Age))

# Example 2 ----------------------------------
# A dataset with a complex design
data(api, package = "survey")
tbl_svysummary_ex2 <-
  survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc) %>%
  tbl_svysummary(by = "both", include = c(api00, stype))
# }

Run the code above in your browser using DataLab