Last chance! 50% off unlimited learning
Sale ends in
Table construction consists of at least of three functions chained with
magrittr
pipe operator. At first we need to
specify variables for which statistics will be computed with
tab_cells
. Secondary, we calculate statistics with one of
tab_stat_*
functions. And last, we finalize table creation with
tab_pivot
: dataset %>% tab_cells(variable) %>%
tab_stat_cases() %>% tab_pivot()
. After that we can optionally sort table
with tab_sort_asc, drop empty rows/columns with drop_rc and
transpose with tab_transpose
. Generally, table is just a data.frame so
we can use arbitrary operations on it. Statistic is always calculated with
the last cell, column/row variables, weight, missing values and subgroup. To
define new cell/column/row variables we can call appropriate function one more time.
tab_pivot
defines how we combine different statistics and where
statistic labels will appear - inside/outside rows/columns. See examples.
For significance testing see significance.
tab_cols(data, ...)tab_cells(data, ...)
tab_rows(data, ...)
tab_weight(data, weight = NULL)
tab_mis_val(data, ...)
tab_total_label(data, ...)
tab_total_statistic(data, ...)
tab_total_row_position(data, total_row_position = c("below", "above", "none"))
tab_subgroup(data, subgroup = NULL)
tab_row_label(data, ..., label = NULL)
tab_stat_fun(data, ..., label = NULL, unsafe = FALSE)
tab_stat_mean_sd_n(
data,
weighted_valid_n = FALSE,
labels = c("Mean", "Std. dev.", ifelse(weighted_valid_n, "Valid N", "Unw. valid N")),
label = NULL
)
tab_stat_mean(data, label = "Mean")
tab_stat_median(data, label = "Median")
tab_stat_se(data, label = "S. E.")
tab_stat_sum(data, label = "Sum")
tab_stat_min(data, label = "Min.")
tab_stat_max(data, label = "Max.")
tab_stat_sd(data, label = "Std. dev.")
tab_stat_valid_n(data, label = "Valid N")
tab_stat_unweighted_valid_n(data, label = "Unw. valid N")
tab_stat_fun_df(data, ..., label = NULL, unsafe = FALSE)
tab_stat_cases(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_cpct(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_cpct_responses(
data,
total_label = NULL,
total_statistic = "u_responses",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_tpct(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_stat_rpct(
data,
total_label = NULL,
total_statistic = "u_cases",
total_row_position = c("below", "above", "none"),
label = NULL
)
tab_last_vstack(
data,
stat_position = c("outside_rows", "inside_rows"),
stat_label = c("inside", "outside"),
label = NULL
)
tab_last_hstack(
data,
stat_position = c("outside_columns", "inside_columns"),
stat_label = c("inside", "outside"),
label = NULL
)
tab_pivot(
data,
stat_position = c("outside_rows", "inside_rows", "outside_columns", "inside_columns"),
stat_label = c("inside", "outside")
)
tab_transpose(data)
tab_caption(data, ...)
All of these functions return object of class
intermediate_table
except tab_pivot
which returns final
result - object of class etable
. Basically it's a data.frame but
class is needed for custom methods.
data.frame/intermediate_table
vector/data.frame/list. Variables for tables. Use mrset/mdset for multiple-response variables.
numeric vector in tab_weight
. Cases with NA's, negative
and zero weights are removed before calculations.
Position of total row in the resulting table. Can be one of "below", "above", "none".
logical vector in tab_subgroup
. You can specify
subgroup on which table will be computed.
character. Label for the statistic in the tab_stat_*
.
logical If TRUE than fun
will be evaluated as is. It can
lead to significant increase in the performance. But there are some
limitations. For tab_stat_fun
it means that your function fun
should return vector of length one. Also there will be no attempts to make
labels for statistic. For tab_stat_fun_df
your function should return
vector of length one or list/data.frame (optionally with 'row_labels'
element - statistic labels). If unsafe
is TRUE then further
arguments (...
) for fun
will be ignored.
logical. Sould we show weighted valid N in
tab_stat_mean_sd_n
? By default it is FALSE.
character vector of length 3. Labels for mean, standard
deviation and valid N in tab_stat_mean_sd_n
.
By default "#Total". You can provide several names - each name for each total statistics.
By default it is "u_cases" (unweighted cases). Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct", "w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted statistics and "w_" means weighted statistics.
character one of the values "outside_rows"
,
"inside_rows"
, "outside_columns"
or "inside_columns"
.
It defines how we will combine statistics in the table.
character one of the values "inside"
or
"outside"
. Where will be placed labels for the statistics relative
to column names/row labels? See examples.
tab_cells
variables on which percentage/cases/summary
functions will be computed. Use mrset/mdset for
multiple-response variables.
tab_cols
optional variables which breaks table by
columns. Use mrset/mdset for
multiple-response variables.
tab_rows
optional variables which breaks table by rows. Use
mrset/mdset for multiple-response variables.
tab_weight
optional weight for the statistic.
tab_mis_val
optional missing values for the statistic. It will
be applied on variables specified by tab_cells
. It works in the same
manner as na_if.
tab_subgroup
optional logical vector/expression which specify
subset of data for table.
tab_row_label
Add to table empty row with specified row
labels. It is usefull for making section headings and etc.
tab_total_row_position
Default value for
total_row_position
argument in tab_stat_cases
and etc. Can be
one of "below", "above", "none".
tab_total_label
Default value for total_label
argument
in tab_stat_cases
and etc. You can provide several names - each name
for each total statistics.
tab_total_statistic
Default value for total_statistic
argument in tab_stat_cases
and etc. You can provide several values.
Possible values are "u_cases", "u_responses", "u_cpct", "u_rpct", "u_tpct",
"w_cases", "w_responses", "w_cpct", "w_rpct", "w_tpct". "u_" means unweighted
statistics and "w_" means weighted statistics.
tab_stat_fun
, tab_stat_fun_df
tab_stat_fun
applies function on each variable in cells separately, tab_stat_fun_df
gives to function each data.frame in cells as a whole
data.table with all names converted to variable labels (if
labels exists). So it is not recommended to rely on original variables names
in your fun
. For details see cross_fun. You can provide several
functions as arguments. They will be combined as with
combine_functions. So you can use method
argument. For details
see documentation for combine_functions.
tab_stat_cases
calculate counts.
tab_stat_cpct
, tab_stat_cpct_responses
calculate column
percent. These functions give different results only for multiple response
variables. For tab_stat_cpct
base of percent is number of valid cases.
Case is considered as valid if it has at least one non-NA value. So for
multiple response variables sum of percent may be greater than 100. For
tab_stat_cpct_responses
base of percent is number of valid responses.
Multiple response variables can have several responses for single case. Sum
of percent of tab_stat_cpct_responses
always equals to 100%.
tab_stat_rpct
calculate row percent. Base
for percent is number of valid cases.
tab_stat_tpct
calculate table percent. Base
for percent is number of valid cases.
tab_stat_mean
, tab_stat_median
, tab_stat_se
,
tab_stat_sum
, tab_stat_min
, tab_stat_max
,
tab_stat_sd
, tab_stat_valid_n
,
tab_stat_unweighted_valid_n
different summary statistics. NA's are
always omitted.
tab_pivot
finalize table creation and define how different
tab_stat_*
will be combined
tab_caption
set caption on the table. Should be used after the tab_pivot
.
tab_transpose
transpose final table after tab_pivot
or last
statistic.
fre, cross_cases, cross_fun, tab_sort_asc, drop_empty_rows, significance.
if (FALSE) {
data(mtcars)
mtcars = apply_labels(mtcars,
mpg = "Miles/(US) gallon",
cyl = "Number of cylinders",
disp = "Displacement (cu.in.)",
hp = "Gross horsepower",
drat = "Rear axle ratio",
wt = "Weight (1000 lbs)",
qsec = "1/4 mile time",
vs = "Engine",
vs = c("V-engine" = 0,
"Straight engine" = 1),
am = "Transmission",
am = c("Automatic" = 0,
"Manual"=1),
gear = "Number of forward gears",
carb = "Number of carburetors"
)
# some examples from 'cro'
# simple example - generally with 'cro' it can be made with less typing
mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_stat_cpct() %>%
tab_pivot()
# split rows
mtcars %>%
tab_cells(cyl) %>%
tab_cols(vs) %>%
tab_rows(am) %>%
tab_stat_cpct() %>%
tab_pivot()
# multiple banners
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs, am) %>%
tab_stat_cpct() %>%
tab_pivot()
# nested banners
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot()
# summary statistics
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n) %>%
tab_pivot()
# summary statistics - labels in columns
mtcars %>%
tab_cells(mpg, disp, hp, wt, qsec) %>%
tab_cols(am) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n, method = list) %>%
tab_pivot()
# subgroup with droping empty columns
mtcars %>%
tab_subgroup(am == 0) %>%
tab_cells(cyl) %>%
tab_cols(total(), vs %nest% am) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
drop_empty_columns()
# total position at the top of the table
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), vs) %>%
tab_rows(am) %>%
tab_stat_cpct(total_row_position = "above",
total_label = c("number of cases", "row %"),
total_statistic = c("u_cases", "u_rpct")) %>%
tab_pivot()
# this example cannot be made easily with 'cro'
mtcars %>%
tab_cells(am) %>%
tab_cols(total(), vs) %>%
tab_total_row_position("none") %>%
tab_stat_cpct(label = "col %") %>%
tab_stat_rpct(label = "row %") %>%
tab_stat_tpct(label = "table %") %>%
tab_pivot(stat_position = "inside_rows")
# statistic labels inside columns
mtcars %>%
tab_cells(am) %>%
tab_cols(total(), vs) %>%
tab_total_row_position("none") %>%
tab_stat_cpct(label = "col %") %>%
tab_stat_rpct(label = "row %") %>%
tab_stat_tpct(label = "table %") %>%
tab_pivot(stat_position = "inside_columns")
# stacked statistics
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_stat_mean() %>%
tab_stat_se() %>%
tab_stat_valid_n() %>%
tab_stat_cpct() %>%
tab_pivot()
# stacked statistics with section headings
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_row_label("#Summary statistics") %>%
tab_stat_mean() %>%
tab_stat_se() %>%
tab_stat_valid_n() %>%
tab_row_label("#Column percent") %>%
tab_stat_cpct() %>%
tab_pivot()
# stacked statistics with different variables
mtcars %>%
tab_cols(total(), am) %>%
tab_cells(mpg, hp, qsec) %>%
tab_stat_mean() %>%
tab_cells(cyl, carb) %>%
tab_stat_cpct() %>%
tab_pivot()
# stacked statistics - label position outside row labels
mtcars %>%
tab_cells(cyl) %>%
tab_cols(total(), am) %>%
tab_stat_mean() %>%
tab_stat_se %>%
tab_stat_valid_n() %>%
tab_stat_cpct(label = "Col %") %>%
tab_pivot(stat_label = "outside")
# example from 'cross_fun_df' - linear regression by groups with sorting
mtcars %>%
tab_cells(sheet(mpg, disp, hp, wt, qsec)) %>%
tab_cols(total(), am) %>%
tab_stat_fun_df(
function(x){
frm = reformulate(".", response = as.name(names(x)[1]))
model = lm(frm, data = x)
sheet('Coef.' = coef(model),
confint(model)
)
}
) %>%
tab_pivot() %>%
tab_sort_desc()
# multiple-response variables and weight
data(product_test)
codeframe_likes = num_lab("
1 Liked everything
2 Disliked everything
3 Chocolate
4 Appearance
5 Taste
6 Stuffing
7 Nuts
8 Consistency
98 Other
99 Hard to answer
")
set.seed(1)
product_test = product_test %>%
let(
# recode age by groups
age_cat = recode(s2a, lo %thru% 25 ~ 1, lo %thru% hi ~ 2),
wgt = runif(.N, 0.25, 4),
wgt = wgt/sum(wgt)*.N
) %>%
apply_labels(
age_cat = "Age",
age_cat = c("18 - 25" = 1, "26 - 35" = 2),
a1_1 = "Likes. VSX123",
b1_1 = "Likes. SDF456",
a1_1 = codeframe_likes,
b1_1 = codeframe_likes
)
product_test %>%
tab_cells(mrset(a1_1 %to% a1_6), mrset(b1_1 %to% b1_6)) %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
tab_stat_cpct() %>%
tab_sort_desc() %>%
tab_pivot()
# trick to place cell variables labels inside columns
# useful to compare two variables
# '|' is needed to prevent automatic labels creation from argument
# alternatively we can use list(...) to avoid this
product_test %>%
tab_cols(total(), age_cat) %>%
tab_weight(wgt) %>%
tab_cells("|" = unvr(mrset(a1_1 %to% a1_6))) %>%
tab_stat_cpct(label = var_lab(a1_1)) %>%
tab_cells("|" = unvr(mrset(b1_1 %to% b1_6))) %>%
tab_stat_cpct(label = var_lab(b1_1)) %>%
tab_pivot(stat_position = "inside_columns")
# if you need standard evaluation, use 'vars'
tables = mtcars %>%
tab_cols(total(), am %nest% vs)
for(each in c("mpg", "disp", "hp", "qsec")){
tables = tables %>% tab_cells(vars(each)) %>%
tab_stat_fun(Mean = w_mean, "Std. dev." = w_sd, "Valid N" = w_n)
}
tables %>% tab_pivot()
}
Run the code above in your browser using DataLab