chi_squared_test: Chi-Squared test

Description

This function performs a \(\chi^2\) test for contingency tables or tests for given probabilities. The returned effects sizes are Cramer's V for tables with more than two rows or columns, Phi (\(\phi\)) for 2x2 tables, and Fei (פ) for tests against given probabilities (see Ben-Shachar et al. 2023).

Usage

chi_squared_test(
  data,
  select = NULL,
  by = NULL,
  probabilities = NULL,
  weights = NULL,
  paired = FALSE,
  ...
)

Value

A data frame with test results. The returned effects sizes are Cramer's V for tables with more than two rows or columns, Phi (\(\phi\)) for 2x2 tables, and Fei (פ) for tests against given probabilities.

Arguments

data

A data frame.

select

Name(s) of the continuous variable(s) (as character vector) to be used as samples for the test. select can be one of the following:

select can be used in combination with by, in which case select is the name of the continous variable (and by indicates a grouping factor).
select can also be a character vector of length two or more (more than two names only apply to kruskal_wallis_test()), in which case the two continuous variables are treated as samples to be compared. by must be NULL in this case.
If select select is of length two and paired = TRUE, the two samples are considered as dependent and a paired test is carried out.
If select specifies one variable and by = NULL, a one-sample test is carried out (only applicable for t_test() and wilcoxon_test())
For chi_squared_test(), if select specifies one variable and both by and probabilities are NULL, a one-sample test against given probabilities is automatically conducted, with equal probabilities for each level of select.

by

Name of the variable indicating the groups. Required if select specifies only one variable that contains all samples to be compared in the test. If by is not a factor, it will be coerced to a factor. For chi_squared_test(), if probabilities is provided, by must be NULL.

probabilities

A numeric vector of probabilities for each cell in the contingency table. The length of the vector must match the number of cells in the table, i.e. the number of unique levels of the variable specified in select. If probabilities is provided, a chi-squared test for given probabilities is conducted. Furthermore, if probabilities is given, by must be NULL. The probabilities must sum to 1.

weights

Name of an (optional) weighting variable to be used for the test.

paired

Logical, if TRUE, a McNemar test is conducted for 2x2 tables. Note that paired only works for 2x2 tables.

...

Additional arguments passed down to chisq.test().

Which test to use

The following table provides an overview of which test to use for different types of data. The choice of test depends on the scale of the outcome variable and the number of samples to compare.

Samples	Scale of Outcome	Significance Test
1	binary / nominal	`chi_squared_test()`
1	continuous, not normal	`wilcoxon_test()`
1	continuous, normal	`t_test()`
2, independent	binary / nominal	`chi_squared_test()`
2, independent	continuous, not normal	`mann_whitney_test()`
2, independent	continuous, normal	`t_test()`
2, dependent	binary (only 2x2)	`chi_squared_test(paired=TRUE)`
2, dependent	continuous, not normal	`wilcoxon_test()`
2, dependent	continuous, normal	`t_test(paired=TRUE)`
>2, independent	continuous, not normal	`kruskal_wallis_test()`
>2, independent	continuous, normal	`datawizard::means_by_group()`
>2, dependent	continuous, not normal	not yet implemented (1)
>2, dependent	continuous, normal	not yet implemented (2)

(1) More than two dependent samples are considered as repeated measurements. For ordinal or not-normally distributed outcomes, these samples are usually tested using a friedman.test(), which requires the samples in one variable, the groups to compare in another variable, and a third variable indicating the repeated measurements (subject IDs).

(2) More than two dependent samples are considered as repeated measurements. For normally distributed outcomes, these samples are usually tested using a ANOVA for repeated measurements. A more sophisticated approach would be using a linear mixed model.

Details

The function is a wrapper around chisq.test() and fisher.test() (for small expected values) for contingency tables, and chisq.test() for given probabilities. When probabilities are provided, these are rescaled to sum to 1 (i.e. rescale.p = TRUE). When fisher.test() is called, simulated p-values are returned (i.e. simulate.p.value = TRUE, see ?fisher.test). If paired = TRUE and a 2x2 table is provided, a McNemar test (see mcnemar.test()) is conducted.

The weighted version of the chi-squared test is based on the a weighted table, using xtabs() as input for chisq.test().

Interpretation of effect sizes are based on rules described in effectsize::interpret_phi(), effectsize::interpret_cramers_v(), and effectsize::interpret_fei(). Use these function directly to get other interpretations, by providing the returned effect size as argument, e.g. interpret_phi(0.35, rules = "gignac2016").

References

Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. tools:::Rd_expr_doi("10.3390/math11091982")
Bender, R., Lange, S., Ziegler, A. Wichtige Signifikanztests. Dtsch Med Wochenschr 2007; 132: e24–e25
du Prel, J.B., Röhrig, B., Hommel, G., Blettner, M. Auswahl statistischer Testverfahren. Dtsch Arztebl Int 2010; 107(19): 343–8

Examples

Run this code

if (FALSE) { # requireNamespace("effectsize") && requireNamespace("MASS")
data(efc)
efc$weight <- abs(rnorm(nrow(efc), 1, 0.3))

# Chi-squared test
chi_squared_test(efc, "c161sex", by = "e16sex")

# weighted Chi-squared test
chi_squared_test(efc, "c161sex", by = "e16sex", weights = "weight")

# Chi-squared test for given probabilities
chi_squared_test(efc, "c161sex", probabilities = c(0.3, 0.7))
}

Run the code above in your browser using DataLab