props_diff: Proportion Difference of Multiple Variables Across Two Independent Groups (Chi-square Tests of Independence)

Description

props_diff tests the proportion difference of multiple variables across two independent groups with chi-square tests of independence. The function also calculates the descriptive statistics for each group, various standardized effect sizes (e.g., Cramer's V), and can provide the 2x2 contingency tables. props_diff is simply a wrapper for prop.test plus some extra calculations.

Usage

props_diff(
  data,
  vrb.nm,
  bin.nm,
  lvl = levels(as.factor(data[[bin.nm]])),
  yates = TRUE,
  zero.cell = 0.05,
  smooth = TRUE,
  ci.level = 0.95,
  rtn.table = TRUE,
  check = TRUE
)

Value

list of data.frames containing statistical information about the prop differences (the rownames of each data.frame are vrb.nm): 1) chisqtest = chi-square tests of independence stat info in a data.frame, 2) describes = descriptive statistics stat info in a data.frame, 3) effects = various standardized effect sizes in a data.frame, 4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of counts with additional rows and columns for totals (if

rtn.table = TRUE), 5) percent = numeric 3D array with dim =

[3, 3, length(vrb.nm)] of the 2x2 contingency tables of overall percentages with additional rows and columns for totals (if

rtn.table = TRUE).

1) chisqtest = chi-square tests of independence stat info in a data.frame

est: mean difference estimate (i.e., group 2 - group 1)

NA (to remind the user there is no standard error for the test)

chi-square value

degrees of freedom (will always be 1)

two-sided p-value

lwr

lower bound of the confidence interval

upr

upper bound of the confidence interval

2) describes = descriptive statistics stat info in a data.frame

prop_`lvl[2]`: proportion of group 2

prop_`lvl[1]`

proportion of group 1

sd_`lvl[2]`

standard deviation of group 2

sd_`lvl[1]`

standard deviation of group 1

n_`lvl[2]`

sample size of group 2

n_`lvl[1]`

sample size of group 1

3) effects = various standardized effect sizes in a data.frame

cramer: Cramer's V estimate

Cohen's h estimate

phi

Phi coefficient estimate

yule

Yule coefficient estimate

tetra

Tetrachoric correlation estimate

odds ratio estimate

risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).

4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of counts with additional rows and columns for totals (if rtn.table = TRUE).

The two unique observed values of data[vrb.nm] (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of

data[[bin.nm]] - plus the total - are the columns. The variables themselves as the layers (i.e., 3rd dimension of the array). The dimlabels are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. `lvl[1]`, 2. `lvl[2]`, 3. "total". The laynames are vrb.nm.

5) percent = numeric 3D array with dim = [3, 3, length(vrb.nm)] of the 2x2 contingency tables of overall percentages with additional rows and columns for totals (if rtn.table = TRUE).

The two unique observed values of data[vrb.nm] (i.e., 0 and 1) - plus the total - are the rows and the two unique observed values of

data[[bin]] - plus the total - are the columns. The variables themselves as the layers (i.e., 3rd dimension of the array). The dimlabels are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. `lvl[1]`, 2. `lvl[2]`, 3. "total". The laynames are vrb.nm.

Arguments

data: data.frame of data.
vrb.nm: character vector specifying the colnames in data for the variables. Since we are testing proportions, the variables must be dummy codes such that they only have values of 0 or 1 (or missing values).
bin.nm: character vector of length 1 specifying the colname in data for the binary variable that only takes on two values (or missing values), specifying the two independent groups.
lvl: character vector with length 2 specifying the unique values for the two groups. If bin is a factor, then lvl should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the prop difference. prop_diff calculates the prop differences as x[ bin == lvl[2] ] - x[ bin == lvl[1] ] such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the prop differences can be changed. See details of prop_diff.
yates: logical vector of length 1 specifying whether the Yate's continuity correction should be applied for small samples. See chisq.test for details.
zero.cell: numeric vector of length 1 specifying what value to impute for zero cell counts in the 2x2 contingency table when computing the tetrachoric correlations. See tetrachoric for details.
smooth: logical vector of length 1 specifying whether a smoothing algorithm should be applied when estimating the tetrachoric correlations. See tetrachoric for details.
ci.level: numeric vector of length 1 specifying the confidence level. ci.level must range from 0 to 1.
rtn.table: logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a 3D array of counts and "percent" containing a 3D array of overall percentages.
check: logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if data[[bin.nm]] has more than 2 unique values (other than missing values). This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Examples

Run this code


# rtn.table = TRUE (default)

# multiple variables
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
   tmp <- c("vs_bin", nm)
   table(mtcars2[tmp])
})
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs_bin")

# single variable
props_diff(mtcars2, vrb.nm = "am", bin.nm = "vs_bin")

# rtn.table = FALSE (no "count" or "percent" list elements)

# multiple variables
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs",
   rtn.table = FALSE)

# single variable
props_diff(mtcars, vrb.nm = "am", bin.nm = "vs",
   rtn.table = FALSE)

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples