props_diff
tests the proportion difference of multiple variables
across two independent groups with chi-square tests of independence. The
function also calculates the descriptive statistics for each group, various
standardized effect sizes (e.g., Cramer's V), and can provide the 2x2
contingency tables. props_diff
is simply a wrapper for
prop.test
plus some extra calculations.
props_diff(
data,
vrb.nm,
bin.nm,
lvl = levels(as.factor(data[[bin.nm]])),
yates = TRUE,
zero.cell = 0.05,
smooth = TRUE,
ci.level = 0.95,
rtn.table = TRUE,
check = TRUE
)
list of data.frames containing statistical information about the prop
differences (the rownames of each data.frame are vrb.nm
): 1)
chisqtest = chi-square tests of independence stat info in a data.frame, 2)
describes = descriptive statistics stat info in a data.frame, 3) effects =
various standardized effect sizes in a data.frame, 4) count = numeric 3D
array with dim = [3, 3, length(vrb.nm)]
of the 2x2 contingency
tables of counts with additional rows and columns for totals (if
rtn.table
= TRUE), 5) percent = numeric 3D array with dim =
[3, 3, length(vrb.nm)]
of the 2x2 contingency tables of overall
percentages with additional rows and columns for totals (if
rtn.table
= TRUE).
1) chisqtest = chi-square tests of independence stat info in a data.frame
mean difference estimate (i.e., group 2 - group 1)
NA (to remind the user there is no standard error for the test)
chi-square value
degrees of freedom (will always be 1)
two-sided p-value
lower bound of the confidence interval
upper bound of the confidence interval
2) describes = descriptive statistics stat info in a data.frame
proportion of group 2
proportion of group 1
standard deviation of group 2
standard deviation of group 1
sample size of group 2
sample size of group 1
3) effects = various standardized effect sizes in a data.frame
Cramer's V estimate
Cohen's h estimate
Phi coefficient estimate
Yule coefficient estimate
Tetrachoric correlation estimate
odds ratio estimate
risk ratio estimate calculated as (i.e., group 2 / group 1). Note this value will often differ when recoding variables (as it should).
4) count = numeric 3D array with dim = [3, 3, length(vrb.nm)]
of the
2x2 contingency tables of counts with additional rows and columns for totals
(if rtn.table
= TRUE).
The two unique observed values of data[vrb.nm]
(i.e., 0 and 1) -
plus the total - are the rows and the two unique observed values of
data[[bin.nm]]
- plus the total - are the columns. The variables
themselves as the layers (i.e., 3rd dimension of the array). The dimlabels
are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The
rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. `lvl[1]`, 2.
`lvl[2]`, 3. "total". The laynames are vrb.nm
.
5) percent = numeric 3D array with dim = [3, 3, length(vrb.nm)]
of the
2x2 contingency tables of overall percentages with additional rows and
columns for totals (if rtn.table
= TRUE).
The two unique observed values of data[vrb.nm]
(i.e., 0 and 1) -
plus the total - are the rows and the two unique observed values of
data[[bin]]
- plus the total - are the columns. The variables
themselves as the layers (i.e., 3rd dimension of the array). The dimlabels
are "bin" for the rows, "x" for the columns, and "vrb" for the layers. The
rownames are 1. "0", 2. "1", 3. "total". The colnames are 1. `lvl[1]`, 2.
`lvl[2]`, 3. "total". The laynames are vrb.nm
.
data.frame of data.
character vector specifying the colnames in data
for the
variables. Since we are testing proportions, the variables must be dummy
codes such that they only have values of 0 or 1 (or missing values).
character vector of length 1 specifying the colname in data
for the binary variable that only takes on two values (or missing values),
specifying the two independent groups.
character vector with length 2 specifying the unique values for
the two groups. If bin
is a factor, then lvl
should be the
factor levels rather than the underlying integer codes. This argument
allows you to specify the direction of the prop difference.
prop_diff
calculates the prop differences as x[ bin == lvl[2]
]
- x[ bin == lvl[1] ]
such that it is group 2 - group 1. By
changing which group is group 1 vs. group 2, the direction of the prop
differences can be changed. See details of prop_diff
.
logical vector of length 1 specifying whether the Yate's
continuity correction should be applied for small samples. See
chisq.test
for details.
numeric vector of length 1 specifying what value to impute
for zero cell counts in the 2x2 contingency table when computing the
tetrachoric correlations. See tetrachoric
for details.
logical vector of length 1 specifying whether a smoothing
algorithm should be applied when estimating the tetrachoric correlations.
See tetrachoric
for details.
numeric vector of length 1 specifying the confidence level.
ci.level
must range from 0 to 1.
logical vector of lengh 1 specifying whether the return object should include the 2x2 contingency table of counts with totals and the 2x2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a 3D array of counts and "percent" containing a 3D array of overall percentages.
logical vector of length 1 specifying whether the input
arguments should be checked for errors. For example, if
data[[bin.nm]]
has more than 2 unique values (other than missing
values). This is a tradeoff between computational efficiency (FALSE) and
more useful error messages (TRUE).
prop.test
the workhorse for props_diff
,
prop_diff
for a single dummy variable,
phi
for another phi coefficient function
Yule
for another yule coefficient function
tetrachoric
for another tetrachoric coefficient function
# rtn.table = TRUE (default)
# multiple variables
mtcars2 <- mtcars
mtcars2$"vs_bin" <- ifelse(mtcars$"vs" == 1, yes = "yes", no = "no")
mtcars2$"gear_dum" <- ifelse(mtcars2$"gear" > 3, yes = 1L, no = 0L)
mtcars2$"carb_dum" <- ifelse(mtcars2$"carb" > 3, yes = 1L, no = 0L)
vrb_nm <- c("am","gear_dum","carb_dum") # dummy variables
lapply(X = vrb_nm, FUN = function(nm) {
tmp <- c("vs_bin", nm)
table(mtcars2[tmp])
})
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs_bin")
# single variable
props_diff(mtcars2, vrb.nm = "am", bin.nm = "vs_bin")
# rtn.table = FALSE (no "count" or "percent" list elements)
# multiple variables
props_diff(data = mtcars2, vrb.nm = c("am","gear_dum","carb_dum"), bin.nm = "vs",
rtn.table = FALSE)
# single variable
props_diff(mtcars, vrb.nm = "am", bin.nm = "vs",
rtn.table = FALSE)
Run the code above in your browser using DataLab