prop_compare
tests for proportion differences across 3+ independent
groups with a chi-square test of independence. The function also calculates
the descriptive statistics for each group, Cramer's V and its confidence
interval as a standardized effect size, and can provide the X by 2
contingency tables. prop_compare
is simply a wrapper for
prop.test
plus some extra calculations.
prop_compare(
x,
nom,
lvl = levels(as.factor(nom)),
yates = TRUE,
ci.level = 0.95,
rtn.table = TRUE,
check = TRUE
)
list of numeric vectors containing statistical information about the proportion comparisons: 1) nhst = chi-square test of independence stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = standardized effect size and its confidence interval in a numeric vector, 4) count = numeric matrix with dim =
[X+1, 3]
of the X by 2 contingency table of counts with an
additional row and column for totals (if rtn.table
= TRUE), 5)
percent = numeric matrix with dim = [X+1, 3]
of the X by 2
contingency table of overall percentages with an additional row and column
for totals (if rtn.table
= TRUE).
1) nhst = chi-square test of independence stat info in a numeric vector
average proportion difference absolute value (i.e., |group j - group i|)
NA (to remind the user there is no standard error for the test)
chi-square value
degrees of freedom (of the nominal variable)
two-sided p-value
2) desc = descriptive statistics stat info in a numeric vector (note there could be more than 3 groups - groups i, j, and k are just provided as an example):
proportion of group k
proportion of group j
proportion of group i
standard deviation of group k
standard deviation of group j
standard deviation of group i
sample size of group k
sample size of group j
sample size of group i
3) std = standardized effect size and its confidence interval in a numeric vector
Cramer's V estimate
lower bound of Cramer's V confidence interval
upper bound of Cramer's V confidence interval
4) count = numeric matrix with dim = [X+1, 3]
of the X by 2
contingency table of counts with an additional row and column for totals (if
rtn.table
= TRUE).
The 3+ unique observed values of nom
- plus the total - are the rows
and the two unique observed values of x
(i.e., 0 and 1) - plus the
total - are the columns. The dimlabels are "nom" for the rows and "x" for the
columns. The rownames are 1. `lvl[i]`, 2. `lvl[j]`, 3. `lvl[k]`, 4. "total".
The colnames are 1. "0", 2. "1", 3. "total".
5) percent = numeric matrix with dim = [X+1, 3]
of the X by 2
contingency table of overall percentages with an additional row and column
for totals (if rtn.table
= TRUE).
The 3+ unique observed values of nom
- plus the total - are the rows
and the two unique observed values of x
(i.e., 0 and 1) - plus the
total - are the columns. The dimlabels are "nom" for the rows and "x" for the
columns. The rownames are 1. `lvl[i]`, 2. `lvl[j]`, 3. `lvl[k]`, 4. "total".
The rownames are 1. "0", 2. "1", 3. "total".
numeric vector that only has values of 0 or 1 (or missing values), otherwise known as a dummy variable.
atomic vector that takes on three or more unordered values (or missing values), otherwise known as a nominal variable.
character vector with length 2 specifying the unique values for
the two groups. If nom
is a factor, then lvl
should be the
factor levels rather than the underlying integer codes. This argument
allows you to specify order of the proportions in the return object.
logical vector of length 1 specifying whether the Yate's
continuity correction should be applied for small samples. See
chisq.test
for details.
numeric vector of length 1 specifying the confidence level.
ci.level
must range from 0 to 1.
logical vector of lengh 1 specifying whether the return object should include the X by 2 contingency table of counts with totals and the X by 2 overall percentages table. If TRUE, then the last two elements of the return object are "count" containing a matrix of counts and "percent" containing a matrix of overall percentages.
logical vector of length 1 specifying whether the input
arguments should be checked for errors. For example, if nom
has
length different than the length of x
. This is a tradeoff between
computational efficiency (FALSE) and more useful error messages (TRUE).
The confidence interval for Cramer's V is calculated with fisher's r to z transformation as Cramer's V is a kind of multiple correlation coefficient. Cramer's V is transformed to fisher's z units, a symmetric confidence interval for fisher's z is calculated, and then the lower and upper bounds are back-transformed to Cramer's V units.
prop.test
the workhorse for prop_compare
,
props_compare
for multiple dummy variables,
prop_diff
for only 2 independent groups (aka binary variable),
tmp <- replicate(n = 10, expr = mtcars, simplify = FALSE)
mtcars2 <- str2str::ld2d(tmp)
mtcars2$"cyl_fct" <- car::recode(mtcars2$"cyl",
recodes = "4='four'; 6='six'; 8='eight'", as.factor = TRUE)
prop_compare(x = mtcars2$"am", nom = mtcars2$"cyl_fct")
prop_compare(x = mtcars2$"am", nom = mtcars2$"cyl_fct",
lvl = c("four","six","eight")) # specify order of levels in return object
# more than 3 groups
prop_compare(x = ifelse(airquality$"Wind" >= 10, yes = 1, no = 0), nom = airquality$"Month")
prop_compare(x = ifelse(airquality$"Wind" >= 10, yes = 1, no = 0), nom = airquality$"Month",
rtn.table = FALSE) # no contingency tables
Run the code above in your browser using DataLab