phi: Effect size for contingency tables

Description

Compute Cramer's V, phi (\(\phi\)), Cohen's w (an alias of phi), Odds ratios, Risk ratios, Cohen's h and Cohen's g for contingency tables or goodness-of-fit. See details.

Usage

phi(x, y = NULL, ci = 0.95, adjust = FALSE, CI, ...)
cohens_w(x, y = NULL, ci = 0.95, adjust = FALSE, CI, ...)
cramers_v(x, y = NULL, ci = 0.95, adjust = FALSE, CI, ...)
oddsratio(x, y = NULL, ci = 0.95, log = FALSE, ...)
riskratio(x, y = NULL, ci = 0.95, log = FALSE, ...)
cohens_h(x, y = NULL, ci = 0.95, ...)
cohens_g(x, y = NULL, ci = 0.95, ...)

Arguments

a numeric vector or matrix. x and y can also both be factors.

a numeric vector; ignored if x is a matrix. If x is a factor, y should be a factor of the same length.

Confidence Interval (CI) level

adjust

Should the effect size be bias-corrected? Defaults to FALSE.

Deprecated in favor of ci.

...

Arguments passed to stats::chisq.test(), such as p. Ignored for cohens_g().

log

Take in or output the log of the ratio (such as in logistic models).

Value

A data frame with the effect size (Cramers_v, phi (possibly with the suffix _adjusted), Odds_ratio, Risk_ratio (possibly with the prefix log_), Cohens_h, or Cohens_g) and its CIs (CI_low and CI_high).

Confidence Intervals for Cohen's g, OR, RR and Cohen's h

For Cohen's g, confidence intervals are based on the proportion (\(P = g + 0.5\)) confidence intervals returned by stats::prop.test() (minus 0.5), which give a good close approximation.

For Odds ratios, Risk ratios and Cohen's h, confidence intervals are estimated using the standard normal parametric method (see Katz et al., 1978; Szumilas, 2010).

See Confidence Intervals and CI Contains Zero sections for phi, Cohen's w and Cramer's V.

Confidence Intervals

Unless stated otherwise, confidence intervals are estimated using the Noncentrality parameter method; These methods searches for a the best non-central parameters (ncps) of the noncentral t-, F- or Chi-squared distribution for the desired tail-probabilities, and then convert these ncps to the corresponding effect sizes. (See full effectsize-CIs for more.)

CI Contains Zero

Keep in mind that ncp confidence intervals are inverted significance tests, and only inform us about which values are not significantly different than our sample estimate. (They do not inform us about which values are plausible, likely or compatible with our data.) Thus, when CIs contain the value 0, this should not be taken to mean that a null effect size is supported by the data; Instead this merely reflects a non-significant test statistic - i.e. the p-value is greater than alpha (Morey et al., 2016).

For positive only effect sizes (Eta squared, Cramer's V, etc.; Effect sizes associated with Chi-squared and F distributions), this applies also to cases where the lower bound of the CI is equal to 0. Even more care should be taken when the upper bound is equal to 0 - this occurs when p-value is greater than 1-alpha/2 making, the upper bound cannot be estimated, and the upper bound is arbitrarily set to 0 (Steiger, 2004). For example:

eta_squared(aov(mpg ~ factor(gear) + factor(cyl), mtcars[1:7, ]))

## # Effect Size for ANOVA (Type I)
## 
## Parameter    | Eta2 (partial) |       90% CI
## --------------------------------------------
## factor(gear) |           0.58 | [0.00, 0.84]
## factor(cyl)  |           0.46 | [0.00, 0.78]

Details

Cramer's V and phi (\(\phi\)) are effect sizes for tests of independence in 2D contingency tables, or for goodness-of-fit in 1D tables. For 2-by-2 tables, they are identical, and are equal to the simple correlation between two dichotomous variables, ranging between 0 (no dependence) and 1 (perfect dependence). For larger tables, Cramer's V should be used, as it is bounded between 0-1, whereas phi can be larger than 1.

For 2-by-2 contingency tables, Odds ratios, Risk ratios and Cohen's h can also be estimated. Note that these are computed with each column representing the different groups, and the first column representing the treatment group and the second column baseline (or control). Effects are given as treatment / control. If you wish you use rows as groups you must pass a transposed table, or switch the x and y arguments.

Cohen's g is an effect size for dependent (paired) contingency tables ranging between 0 (perfect symmetry) and 0.5 (perfect asymmetry) (see stats::mcnemar.test()).

References

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Katz, D. J. S. M., Baptista, J., Azen, S. P., & Pike, M. C. (1978). Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics, 469-474.
Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3), 227.

Examples

Run this code

# NOT RUN {
M <- rbind(
  c(150, 130, 35, 55),
  c(100, 50, 10, 40),
  c(165, 65, 2, 25)
)
dimnames(M) <- list(
  Study = c("Psych", "Econ", "Law"),
  Music = c("Pop", "Rock", "Jazz", "Classic")
)
M

phi(M)

cramers_v(M)



## 2-by-2 tables
## -------------
(RCT <- matrix(
  c(
    71, 30,
    50, 100
  ),
  nrow = 2, byrow = TRUE,
  dimnames = list(
    Diagnosis = c("Sick", "Recovered"),
    Group = c("Treatment", "Control")
  )
)) # note groups are COLUMNS

oddsratio(RCT)

riskratio(RCT)

cohens_h(RCT)



## Dependent (Paired) Contingency Tables
## -------------------------------------
Performance <- rbind(
  c(794, 86),
  c(150, 570)
)
dimnames(Performance) <- list(
  "1st Survey" = c("Approve", "Disapprove"),
  "2nd Survey" = c("Approve", "Disapprove")
)
Performance

cohens_g(Performance)
# }

Run the code above in your browser using DataLab