prop_compare: Proportion Comparisons for a Single Variable across 3+ Independent Groups (Chi-square Test of Independence)

Description

prop_compare tests for proportion differences across 3+ independent groups with a chi-square test of independence. The function also calculates the descriptive statistics for each group, Cramer's V and its confidence interval as a standardized effect size, and can provide the X by 2 contingency tables. prop_compare is simply a wrapper for prop.test plus some extra calculations.

Usage

prop_compare(
  x,
  nom,
  lvl = levels(as.factor(nom)),
  yates = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Value

list of numeric vectors containing statistical information about the proportion comparisons: 1) nhst = chi-square test of independence stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = standardized effect size and its confidence interval in a numeric vector, 4) count = numeric matrix with dim =

[X+1, 3] of the X by 2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE), 5) percent = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE).

1) nhst = chi-square test of independence stat info in a numeric vector

est: average proportion difference absolute value (i.e., |group j - group i|)

NA (to remind the user there is no standard error for the test)

chi-square value

degrees of freedom (of the nominal variable)

two-sided p-value

2) desc = descriptive statistics stat info in a numeric vector (note there could be more than 3 groups - groups i, j, and k are just provided as an example):

prop_`lvl[k]`: proportion of group k

prop_`lvl[j]`

proportion of group j

prop_`lvl[i]`

proportion of group i

sd_`lvl[k]`

standard deviation of group k

sd_`lvl[j]`

standard deviation of group j

sd_`lvl[i]`

standard deviation of group i

n_`lvl[k]`

sample size of group k

n_`lvl[j]`

sample size of group j

n_`lvl[i]`

sample size of group i

3) std = standardized effect size and its confidence interval in a numeric vector

cramer: Cramer's V estimate

lwr

lower bound of Cramer's V confidence interval

upr

upper bound of Cramer's V confidence interval

4) count = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of counts with an additional row and column for totals (if rtn.table = TRUE).

The 3+ unique observed values of nom - plus the total - are the rows and the two unique observed values of x (i.e., 0 and 1) - plus the total - are the columns. The dimlabels are "nom" for the rows and "x" for the columns. The rownames are 1. `lvl[i]`, 2. `lvl[j]`, 3. `lvl[k]`, 4. "total". The colnames are 1. "0", 2. "1", 3. "total".

5) percent = numeric matrix with dim = [X+1, 3] of the X by 2 contingency table of overall percentages with an additional row and column for totals (if rtn.table = TRUE).

Arguments

x: numeric vector that only has values of 0 or 1 (or missing values), otherwise known as a dummy variable.
nom: atomic vector that takes on three or more unordered values (or missing values), otherwise known as a nominal variable.
lvl: character vector with length 2 specifying the unique values for the two groups. If nom is a factor, then lvl should be the factor levels rather than the underlying integer codes. This argument allows you to specify order of the proportions in the return object.
yates: logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See chisq.test for details.
ci.level: numeric vector of length 1 specifying the confidence level. ci.level must range from 0 to 1.
rtn.table: logical vector of lengh 1 specifying whether the return object should include the X by 2 contingency table of counts with totals and the X by 2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a matrix of counts and "percent" containing a matrix of overall percentages.
check: logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if nom has length different than the length of x. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Details

The confidence interval for Cramer's V is calculated with fisher's r to z transformation as Cramer's V is a kind of multiple correlation coefficient. Cramer's V is transformed to fisher's z units, a symmetric confidence interval for fisher's z is calculated, and then the lower and upper bounds are back-transformed to Cramer's V units.

Examples

Run this code


tmp <- replicate(n = 10, expr = mtcars, simplify = FALSE)
mtcars2 <- str2str::ld2d(tmp)
mtcars2$"cyl_fct" <- car::recode(mtcars2$"cyl",
   recodes = "4='four'; 6='six'; 8='eight'", as.factor = TRUE)
prop_compare(x = mtcars2$"am", nom = mtcars2$"cyl_fct")
prop_compare(x = mtcars2$"am", nom = mtcars2$"cyl_fct",
   lvl = c("four","six","eight")) # specify order of levels in return object

# more than 3 groups
prop_compare(x = ifelse(airquality$"Wind" >= 10, yes = 1, no = 0), nom = airquality$"Month")
prop_compare(x = ifelse(airquality$"Wind" >= 10, yes = 1, no = 0), nom = airquality$"Month",
   rtn.table = FALSE) # no contingency tables