tables: Functions for custom tables construction

Description

Table construction consists of at least of three functions chained with magrittr pipe operator: %>%. At first we need to specify variables for which statistics will be computed with tab_cells. Secondary, we calculate statistics with one of tab_stat_* functions. And last, we finalize table creation with tab_pivot: dataset %>% tab_cells(variable) %>% tab_stat_cases() %>% tab_pivot(). After that we can optionally sort table with tab_sort_asc, drop empty rows/columns with drop_rc and transpose with tab_transpose. Generally, table is just a data.frame so we can use arbitrary operations on it. Statistic is always calculated with the last cell, column/row variables, weight, missing values and subgroup. To define new cell/column/row variables we can call appropriate function one more time. tab_pivot defines how we combine different statistics and where statistic labels will appear - inside/outside rows/columns. See examples. For significance testing see significance.

Usage

tab_cols(data, ...)
tab_cells(data, ...)
tab_rows(data, ...)
tab_weight(data, weight = NULL)
tab_mis_val(data, ...)
tab_total_label(data, ...)
tab_total_statistic(data, ...)
tab_total_row_position(data, total_row_position = c("below", "above", "none"))
tab_subgroup(data, subgroup = NULL)
tab_row_label(data, ..., label = NULL)
tab_stat_fun(data, ..., label = NULL, unsafe = FALSE)
tab_stat_mean_sd_n(data, weighted_valid_n = FALSE, labels = c("Mean",
  "Std. dev.", ifelse(weighted_valid_n, "Valid N", "Unw. valid N")),
  label = NULL)
tab_stat_mean(data, label = "Mean")
tab_stat_median(data, label = "Median")
tab_stat_se(data, label = "S. E.")
tab_stat_sum(data, label = "Sum")
tab_stat_min(data, label = "Min.")
tab_stat_max(data, label = "Max.")
tab_stat_sd(data, label = "Std. dev.")
tab_stat_valid_n(data, label = "Valid N")
tab_stat_unweighted_valid_n(data, label = "Unw. valid N")
tab_stat_fun_df(data, ..., label = NULL, unsafe = FALSE)
tab_stat_cases(data, total_label = NULL, total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"), label = NULL)
tab_stat_cpct(data, total_label = NULL, total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"), label = NULL)
tab_stat_cpct_responses(data, total_label = NULL,
  total_statistic = "u_responses", total_row_position = c("below", "above",
  "none"), label = NULL)
tab_stat_tpct(data, total_label = NULL, total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"), label = NULL)
tab_stat_rpct(data, total_label = NULL, total_statistic = "u_cases",
  total_row_position = c("below", "above", "none"), label = NULL)
tab_last_vstack(data, stat_position = c("outside_rows", "inside_rows"),
  stat_label = c("inside", "outside"), label = NULL)
tab_last_hstack(data, stat_position = c("outside_columns", "inside_columns"),
  stat_label = c("inside", "outside"), label = NULL)
tab_pivot(data, stat_position = c("outside_rows", "inside_rows",
  "outside_columns", "inside_columns"), stat_label = c("inside", "outside"))
tab_transpose(data)

Arguments

data

data.frame/intermediate_table

...

vector/data.frame/list. Variables for tables. Use mrset/mdset for multiple-response variables.

weight

numeric vector in tab_weight. Cases with NA's, negative and zero weights are removed before calculations.

total_row_position

Position of total row in the resulting table. Can be one of "below", "above", "none".

subgroup

logical vector in tab_subgroup. You can specify subgroup on which table will be computed.

label

character. Label for the statistic in the tab_stat_*.

unsafe

logical If TRUE than fun will be evaluated as is. It can lead to significant increase in the performance. But there are some limitations. For tab_stat_fun it means that your function fun should return vector of length one. Also there will be no attempts to make labels for statistic. For tab_stat_fun_df your function should return vector of length one or list/data.frame (optionally with 'row_labels' element - statistic labels). If unsafe is TRUE then further arguments (...) for fun will be ignored.

weighted_valid_n

logical. Sould we show weighted valid N in tab_stat_mean_sd_n? By default it is FALSE.

labels

character vector of length 3. Labels for mean, standard deviation and valid N in tab_stat_mean_sd_n.

total_label

By default "#Total". You can provide several names - each name for each total statistics.

total_statistic

By default it is "u_cases" (unweighted cases). Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.

stat_position

character one of the values "outside_rows", "inside_rows", "outside_columns" or "inside_columns". It defines how we will combine statistics in the table.

stat_label

character one of the values "inside" or "outside". Where will be placed labels for the statistics relative to column names/row labels? See examples.

Value

All of these functions return object of class intermediate_table except tab_pivot which returns final result - object of class etable. Basically it's a data.frame but class is needed for custom methods.

Details

tab_cells variables on which percentage/cases/summary functions will be computed. Use mrset/mdset for multiple-response variables.
tab_cols optional variables which breaks table by columns. Use mrset/mdset for multiple-response variables.
tab_rows optional variables which breaks table by rows. Use mrset/mdset for multiple-response variables.
tab_weight optional weight for the statistic.
tab_mis_val optional missing values for the statistic. It will be applied on variables specified by tab_cells. It works in the same manner as na_if.
tab_subgroup optional logical vector/expression which specify subset of data for table.
tab_row_label Add to table empty row with specified row labels. It is usefull for making section headings and etc.
tab_total_row_position Default value for total_row_position argument in tab_stat_cases and etc. Can be one of "below", "above", "none".
tab_total_label Default value for total_label argument in tab_stat_cases and etc. You can provide several names - each name for each total statistics.
tab_total_statistic Default value for total_statistic argument in tab_stat_cases and etc. You can provide several values. Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.
tab_stat_fun, tab_stat_fun_df tab_stat_fun applies function on each variable in cells separately, tab_stat_fun_df gives to function each data.frame in cells as a whole data.table with all names converted to variable labels (if labels exists). So it is not recommended to rely on original variables names in your fun. For details see cro_fun. You can provide several functions as arguments. They will be combined as with combine_functions. So you can use method argument. For details see documentation for combine_functions.
tab_stat_cases calculate counts.
tab_stat_cpct, tab_stat_cpct_responses calculate column percent. These functions give different results only for multiple response variables. For tab_stat_cpct base of percent is number of valid cases. Case is considered as valid if it has at least one non-NA value. So for multiple response variables sum of percent may be greater than 100. For tab_stat_cpct_responses base of percent is number of valid responses. Multiple response variables can have several responses for single case. Sum of percent of tab_stat_cpct_responses always equals to 100%.
tab_stat_rpct calculate row percent. Base for percent is number of valid cases.
tab_stat_tpct calculate table percent. Base for percent is number of valid cases.
tab_stat_mean, tab_stat_median, tab_stat_se, tab_stat_sum, tab_stat_min, tab_stat_max, tab_stat_sd, tab_stat_valid_n, tab_stat_unweighted_valid_n different summary statistics. NA's are always omitted.
tab_pivot finalize table creation and define how different tab_stat_* will be combined
tab_transpose transpose final table after tab_pivot or last statistic.

Examples

Run this code

# NOT RUN {
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (1000 lbs)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)
# some examples from 'cro'
# simple example - generally with 'cro' it can be made with less typing
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# split rows
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# multiple banners
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs, am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# nested banners
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# summary statistics
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
    tab_pivot()

# summary statistics - labels in columns
mtcars %>% 
    tab_cells(mpg, disp, hp, wt, qsec) %>%
    tab_cols(am) %>% 
    tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
    tab_pivot()

# subgroup with droping empty columns
mtcars %>% 
    tab_subgroup(am == 0) %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs %nest% am) %>% 
    tab_stat_cpct() %>% 
    tab_pivot() %>% 
    drop_empty_columns()

# total position at the top of the table
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), vs) %>% 
    tab_rows(am) %>% 
    tab_stat_cpct(total_row_position = "above",
                  total_label = c("number of cases", "row %"),
                  total_statistic = c("u_cases", "u_rpct")) %>% 
    tab_pivot()

# this example cannot be made easily with 'cro'             
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_rows")

# statistic labels inside columns             
mtcars %>%
    tab_cells(am) %>%
    tab_cols(total(), vs) %>%
    tab_total_row_position("none") %>% 
    tab_stat_cpct(label = "col %") %>%
    tab_stat_rpct(label = "row %") %>%
    tab_stat_tpct(label = "table %") %>%
    tab_pivot(stat_position = "inside_columns")

# stacked statistics
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct() %>% 
    tab_pivot()
    
# stacked statistics with section headings
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_row_label("#Summary statistics") %>% 
    tab_stat_mean() %>%
    tab_stat_se() %>% 
    tab_stat_valid_n() %>% 
    tab_row_label("#Column percent") %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# stacked statistics with different variables
mtcars %>% 
    tab_cols(total(), am) %>% 
    tab_cells(mpg, hp, qsec) %>% 
    tab_stat_mean() %>%
    tab_cells(cyl, carb) %>% 
    tab_stat_cpct() %>% 
    tab_pivot()

# stacked statistics - label position outside row labels
mtcars %>% 
    tab_cells(cyl) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_mean() %>%
    tab_stat_se %>% 
    tab_stat_valid_n() %>% 
    tab_stat_cpct(label = "Col %") %>% 
    tab_pivot(stat_label = "outside")
    
# example from 'cro_fun_df' - linear regression by groups with sorting 
mtcars %>% 
    tab_cells(dtfrm(mpg, disp, hp, wt, qsec)) %>% 
    tab_cols(total(), am) %>% 
    tab_stat_fun_df(
        function(x){
            frm = reformulate(".", response = names(x)[1])
            model = lm(frm, data = x)
            dtfrm('Coef. estimate' = coef(model), 
                  confint(model)
            )
        }    
    ) %>% 
    tab_pivot() %>% 
    tab_sort_desc()

# multiple-response variables and weight
data(product_test)
codeframe_likes = num_lab("
                          1 Liked everything
                          2 Disliked everything
                          3 Chocolate
                          4 Appearance
                          5 Taste
                          6 Stuffing
                          7 Nuts
                          8 Consistency
                          98 Other
                          99 Hard to answer
                          ")

set.seed(1)
product_test = compute(product_test, {
    # recode age by groups
    age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2)
    
    var_lab(age_cat) = "Age"
    val_lab(age_cat) = c("18 - 25" = 1, "26 - 35" = 2)
    
    var_lab(a1_1) = "Likes. VSX123"
    var_lab(b1_1) = "Likes. SDF456"
    val_lab(a1_1) = codeframe_likes
    val_lab(b1_1) = codeframe_likes
    
    wgt = runif(.N, 0.25, 4)
    wgt = wgt/sum(wgt)*.N
})

product_test %>% 
    tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_stat_cpct() %>% 
    tab_sort_desc() %>% 
    tab_pivot()
    
# trick to place cell variables labels inside columns
# useful to compare two variables
# '|' is needed to prevent automatic labels creation from argument
# alternatively we can use list(...) to avoid this
product_test %>% 
    tab_cols(total(), age_cat) %>% 
    tab_weight(wgt) %>% 
    tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>% 
    tab_stat_cpct(label = var_lab(a1_1)) %>% 
    tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>% 
    tab_stat_cpct(label = var_lab(b1_1)) %>% 
    tab_pivot(stat_position = "inside_columns")

# if you need standard evaluation, use 'vars'
tables = mtcars %>%
      tab_cols(total(), am %nest% vs)

for(each in c("mpg", "disp", "hp", "qsec")){
    tables = tables %>% tab_cells(vars(each)) %>%
        tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) 
}
tables %>% tab_pivot()
# }

Run the code above in your browser using DataLab