tab_significance_options: Mark significant differences between columns in the table

Description

significance_cpct conducts z-tests between column percent in the result of cross_cpct. Results are calculated with the same formula as in prop.test without continuity correction.
significance_means conducts t-tests between column means in the result of cross_mean_sd_n. Results are calculated with the same formula as in t.test.
significance_cases conducts chi-squared tests on the subtable of table with counts in the result of cross_cases. Results are calculated with the same formula as in chisq.test.
significance_cell_chisq compute cell chi-square test on table with column percent. The cell chi-square test looks at each table cell and tests whether it is significantly different from its expected value in the overall table. For example, if it is thought that variations in political opinions might depend on the respondent's age, this test can be used to detect which cells contribute significantly to that dependence. Unlike the chi-square test (significance_cases), which is carried out on a whole set of rows and columns, the cell chi-square test is carried out independently on each table cell. Although the significance level of the cell chi-square test is accurate for any given cell, the cell tests cannot be used instead of the chi-square test carried out on the overall table. Their purpose is simply to point to the parts of the table where dependencies between row and column categories may exist.

For significance_cpct and significance_means there are three type of comparisons which can be conducted simultaneously (argument compare_type):

subtable provide comparisons between all columns inside each subtable.
previous_column is a comparison of each column of the subtable with the previous column. It is useful if columns are periods or survey waves.
first_column provides comparison the table first column with all other columns in the table. adjusted_first_column is also comparison with the first column but with adjustment for common base. It is useful if the first column is total column and other columns are subgroups of this total. Adjustments are made according to algorithm in IBM SPSS Statistics Algorithms v20, p. 263. Note that with these adjustments t-tests between means are made with equal variance assumed (as with var_equal = TRUE).

By now there are no adjustments for multiple-response variables (results of mrset) in the table columns so significance tests are rather approximate for such cases. Also, there are functions for the significance testing in the sequence of custom tables calculations (see tables):

tab_last_sig_cpct, tab_last_sig_means and tab_last_sig_cpct make the same tests as their analogs mentioned above. It is recommended to use them after appropriate statistic function: tab_stat_cpct, tab_stat_mean_sd_n and tab_stat_cases.
tab_significance_options With this function we can set significance options for the entire custom table creation sequence.
tab_last_add_sig_labels This function applies add_sig_labels to the last calculated table - it adds labels (letters by default) for significance to columns header. It may be useful if you want to combine a table with significance with table without it.
tab_last_round This function rounds numeric columns in the last calculated table to specified number of digits. It is sometimes needed if you want to combine table with significance with table without it.

Usage

tab_significance_options(
  data,
  sig_level = 0.05,
  min_base = 2,
  delta_cpct = 0,
  delta_means = 0,
  correct = TRUE,
  compare_type = "subtable",
  bonferroni = FALSE,
  subtable_marks = "greater",
  inequality_sign = "both" %in% subtable_marks,
  sig_labels = LETTERS,
  sig_labels_previous_column = c("v", "^"),
  sig_labels_first_column = c("-", "+"),
  sig_labels_chisq = c("<", "="">"),
  keep = c("percent", "cases", "means", "sd", "bases"),
  row_margin = c("auto", "sum_row", "first_column"),
  total_marker = "#",
  total_row = 1,
  digits = get_expss_digits(),
  na_as_zero = FALSE,
  var_equal = FALSE,
  mode = c("replace", "append"),
  as_spss = FALSE
)
tab_last_sig_cpct(
  data,
  sig_level = 0.05,
  delta_cpct = 0,
  min_base = 2,
  compare_type = "subtable",
  bonferroni = FALSE,
  subtable_marks = c("greater", "both", "less"),
  inequality_sign = "both" %in% subtable_marks,
  sig_labels = LETTERS,
  sig_labels_previous_column = c("v", "^"),
  sig_labels_first_column = c("-", "+"),
  keep = c("percent", "bases"),
  na_as_zero = FALSE,
  total_marker = "#",
  total_row = 1,
  digits = get_expss_digits(),
  as_spss = FALSE,
  mode = c("replace", "append"),
  label = NULL
)
tab_last_sig_means(
  data,
  sig_level = 0.05,
  delta_means = 0,
  min_base = 2,
  compare_type = "subtable",
  bonferroni = FALSE,
  subtable_marks = c("greater", "both", "less"),
  inequality_sign = "both" %in% subtable_marks,
  sig_labels = LETTERS,
  sig_labels_previous_column = c("v", "^"),
  sig_labels_first_column = c("-", "+"),
  keep = c("means", "sd", "bases"),
  var_equal = FALSE,
  digits = get_expss_digits(),
  mode = c("replace", "append"),
  label = NULL
)
tab_last_sig_cases(
  data,
  sig_level = 0.05,
  min_base = 2,
  correct = TRUE,
  keep = c("cases", "bases"),
  total_marker = "#",
  total_row = 1,
  digits = get_expss_digits(),
  mode = c("replace", "append"),
  label = NULL
)
tab_last_sig_cell_chisq(
  data,
  sig_level = 0.05,
  min_base = 2,
  subtable_marks = c("both", "greater", "less"),
  sig_labels_chisq = c("<", "="">"),
  correct = TRUE,
  keep = c("percent", "bases", "none"),
  row_margin = c("auto", "sum_row", "first_column"),
  total_marker = "#",
  total_row = 1,
  total_column_marker = "#",
  digits = get_expss_digits(),
  mode = c("replace", "append"),
  label = NULL
)
tab_last_round(data, digits = get_expss_digits())
tab_last_add_sig_labels(data, sig_labels = LETTERS)
significance_cases(
  x,
  sig_level = 0.05,
  min_base = 2,
  correct = TRUE,
  keep = c("cases", "bases"),
  total_marker = "#",
  total_row = 1,
  digits = get_expss_digits()
)
significance_cell_chisq(
  x,
  sig_level = 0.05,
  min_base = 2,
  subtable_marks = c("both", "greater", "less"),
  sig_labels_chisq = c("<", "="">"),
  correct = TRUE,
  keep = c("percent", "bases", "none"),
  row_margin = c("auto", "sum_row", "first_column"),
  total_marker = "#",
  total_row = 1,
  total_column_marker = "#",
  digits = get_expss_digits()
)
cell_chisq(cases_matrix, row_base, col_base, total_base, correct)
significance_cpct(
  x,
  sig_level = 0.05,
  delta_cpct = 0,
  min_base = 2,
  compare_type = "subtable",
  bonferroni = FALSE,
  subtable_marks = c("greater", "both", "less"),
  inequality_sign = "both" %in% subtable_marks,
  sig_labels = LETTERS,
  sig_labels_previous_column = c("v", "^"),
  sig_labels_first_column = c("-", "+"),
  keep = c("percent", "bases"),
  na_as_zero = FALSE,
  total_marker = "#",
  total_row = 1,
  digits = get_expss_digits(),
  as_spss = FALSE
)
add_sig_labels(x, sig_labels = LETTERS)
significance_means(
  x,
  sig_level = 0.05,
  delta_means = 0,
  min_base = 2,
  compare_type = "subtable",
  bonferroni = FALSE,
  subtable_marks = c("greater", "both", "less"),
  inequality_sign = "both" %in% subtable_marks,
  sig_labels = LETTERS,
  sig_labels_previous_column = c("v", "^"),
  sig_labels_first_column = c("-", "+"),
  keep = c("means", "sd", "bases"),
  var_equal = FALSE,
  digits = get_expss_digits()
)

Value

tab_last_* functions return objects of class

intermediate_table. Use tab_pivot to get the final result -

etable object. Other functions return etable object with significant differences.

Arguments

data: data.frame/intermediate_table for tab_* functions.
sig_level: numeric. Significance level - by default it equals to 0.05.
min_base: numeric. Significance test will be conducted if both columns have bases greater or equal to min_base. By default, it equals to 2.
delta_cpct: numeric. Minimal delta between percent for which we mark significant differences (in percent points) - by default it equals to zero. Note that, for example, for minimal 5 percent point difference delta_cpct should be equals 5, not 0.05.
delta_means: numeric. Minimal delta between means for which we mark significant differences - by default it equals to zero.
correct: logical indicating whether to apply continuity correction when computing the test statistic for 2 by 2 tables. Only for significance_cases and significance_cell_chisq. For details see chisq.test. TRUE by default.
compare_type: Type of compare between columns. By default, it is subtable - comparisons will be conducted between columns of each subtable. Other possible values are: first_column, adjusted_first_column and previous_column. We can conduct several tests simultaneously.
bonferroni: logical. FALSE by default. Should we use Bonferroni adjustment by the number of comparisons in each row?
subtable_marks: character. One of "greater", "both" or "less". By deafult we mark only values which are significantly greater than some other columns. For significance_cell_chisq default is "both".We can change this behavior by setting an argument to less or both.
inequality_sign: logical. FALSE if subtable_marks is "less" or "greater". Should we show > or < before significance marks of subtable comparisons.
sig_labels: character vector. Labels for marking differences between columns of subtable.
sig_labels_previous_column: a character vector with two elements. Labels for marking a difference with the previous column. First mark means 'lower' (by default it is v) and the second means greater (^).
sig_labels_first_column: a character vector with two elements. Labels for marking a difference with the first column of the table. First mark means 'lower' (by default it is -) and the second means 'greater' (+).
sig_labels_chisq: a character vector with two labels for marking a difference with row margin of the table. First mark means 'lower' (by default it is <) and the second means 'greater' (>). Only for significance_cell_chisq.
keep: character. One or more from "percent", "cases", "means", "bases", "sd" or "none". This argument determines which statistics will remain in the table after significance marking.
row_margin: character. One of values "auto" (default), "sum_row", or "first_column". If it is "auto" we try to find total column in the subtable by total_column_marker. If the search is failed, we use the sum of each rows as row total. With "sum_row" option we always sum each row to get margin. Note that in this case result for multiple response variables in banners may be incorrect. With "first_column" option we use table first column as row margin for all subtables. In this case result for the subtables with incomplete bases may be incorrect. Only for significance_cell_chisq.
total_marker: character. Total rows mark in the table. "#" by default.
total_row: integer/character. In the case of the several totals per subtable it is a number or name of total row for the significance calculation.
digits: an integer indicating how much digits after decimal separator
na_as_zero: logical. FALSE by default. Should we treat NA's as zero cases?
var_equal: a logical variable indicating whether to treat the two variances as being equal. For details see t.test.
mode: character. One of replace(default) or append. In the first case the previous result in the sequence of table calculation will be replaced with result of significance testing. In the second case result of the significance testing will be appended to sequence of table calculation.
as_spss: a logical. FALSE by default. If TRUE, proportions which are equal to zero or one will be ignored. Also will be ignored categories with bases less than 2.
label: character. Label for the statistic in the tab_*. Ignored if the mode is equals to replace.
total_column_marker: character. Mark for total columns in the subtables. "#" by default.
x: table (class etable): result of cross_cpct with proportions and bases for significance_cpct, result of cross_mean_sd_n with means, standard deviations and valid N for significance_means, and result of cross_cases with counts and bases for significance_cases.
cases_matrix: numeric matrix with counts size R*C
row_base: numeric vector with row bases, length R
col_base: numeric vector with col bases, length C
total_base: numeric single value, total base

Description

Usage

Value

Arguments

See Also