ci.prop.diff: Confidence Interval for the Difference in Proportions

Description

This function computes a confidence interval for the difference in proportions in a two-sample and paired-sample design for one or more variables, optionally by a grouping and/or split variable.

Usage

ci.prop.diff(x, ...)
# S3 method for default
ci.prop.diff(x, y, method = c("wald", "newcombe"), paired = FALSE,
             alternative = c("two.sided", "less", "greater"), conf.level = 0.95,
             group = NULL, split = NULL, sort.var = FALSE, digits = 2,
             as.na = NULL, write = NULL, append = TRUE,
             check = TRUE, output = TRUE, ...)
# S3 method for formula
ci.prop.diff(formula, data, method = c("wald", "newcombe"),
             alternative = c("two.sided", "less", "greater"), conf.level = 0.95,
             group = NULL, split = NULL, sort.var = FALSE, na.omit = FALSE,
             digits = 2, as.na = NULL, write = NULL, append = TRUE,
             check = TRUE, output = TRUE, ...)

Value

Returns an object of class misty.object, which is a list with following entries:

call: function call
type: type of analysis
data: list with the input specified in x, group, and split
args: specification of function arguments
result: result table

Arguments

x: a numeric vector with 0 and 1 values.
...: further arguments to be passed to or from methods.
y: a numeric vector with 0 and 1 values.
method: a character string specifying the method for computing the confidence interval, must be one of "wald", or "newcombe" (default).
paired: logical: if TRUE, confidence interval for the difference of proportions in paired samples is computed.
alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
conf.level: a numeric value between 0 and 1 indicating the confidence level of the interval.
group: a numeric vector, character vector or factor as grouping variable. Note that a grouping variable can only be used when computing confidence intervals with unknown population standard deviation and population variance.
split: a numeric vector, character vector or factor as split variable. Note that a split variable can only be used when computing confidence intervals with unknown population standard deviation and population variance.
sort.var: logical: if TRUE, output table is sorted by variables when specifying group.
digits: an integer value indicating the number of decimal places to be used.
as.na: a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. Note that as.na() function is only applied to x, but not to group or split.
write: a character string naming a text file with file extension ".txt" (e.g., "Output.txt") for writing the output into a text file.
append: logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.
check: logical: if TRUE (default), argument specification is checked.
output: logical: if TRUE (default), output is shown on the console.
formula: a formula of the form y ~ group for one outcome variable or cbind(y1, y2, y3) ~ group for more than one outcome variable where y is a numeric variable with 0 and 1 values and group a numeric variable, character variable or factor with two values or factor levels giving the corresponding group.
data: a matrix or data frame containing the variables in the formula formula.
na.omit: logical: if TRUE, incomplete cases are removed before conducting the analysis (i.e., listwise deletion) when specifying more than one outcome variable.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

The Wald confidence interval which is based on the normal approximation to the binomial distribution are computed by specifying method = "wald", while the Newcombe Hybrid Score interval (Newcombe, 1998a; Newcombe, 1998b) is requested by specifying method = "newcombe". By default, Newcombe Hybrid Score interval is computed which have been shown to be reliable in small samples (less than n = 30 in each sample) as well as moderate to larger samples(n > 30 in each sample) and with proportions close to 0 or 1, while the Wald confidence intervals does not perform well unless the sample size is large (Fagerland, Lydersen & Laake, 2011).

References

Fagerland, M. W., Lydersen S., & Laake, P. (2011) Recommended confidence intervals for two independent binomial proportions. Statistical Methods in Medical Research, 24, 224-254.

Newcombe, R. G. (1998a). Interval estimation for the difference between independent proportions: Comparison of eleven methods. Statistics in Medicine, 17, 873-890.

Newcombe, R. G. (1998b). Improved confidence intervals for the difference between binomial proportions based on paired data. Statistics in Medicine, 17, 2635-2650.

Rasch, D., Kubinger, K. D., & Yanagida, T. (2011). Statistics in psychology - Using R and SPSS. John Wiley & Sons.

Examples

Run this code

dat1 <- data.frame(group1 = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
                              1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
                   group2 = c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 2, 2, 2,
                              1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2),
                   group3 = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,
                              1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
                   x1 = c(0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, NA, 0, 0,
                          1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0),
                   x2 = c(0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1,
                          1, 0, 1, 0, 1, 1, 1, NA, 1, 0, 0, 1, 1, 1),
                   x3 = c(1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0,
                          1, 0, 1, 1, 0, 1, 1, 1, 0, 1, NA, 1, 0, 1))

#-------------------------------------------------------------------------------
# Two-sample design

# Example 1: Two-Sided 95% CI for x1 by group1
# Newcombes Hybrid Score interval
ci.prop.diff(x1 ~ group1, data = dat1)

# Example 2: Two-Sided 95% CI for x1 by group1
# Wald CI
ci.prop.diff(x1 ~ group1, data = dat1, method = "wald")

# Example 3: One-Sided 95% CI for x1 by group1
# Newcombes Hybrid Score interval
ci.prop.diff(x1 ~ group1, data = dat1, alternative = "less")

# Example 4: Two-Sided 99% CI for x1 by group1
# Newcombes Hybrid Score interval
ci.prop.diff(x1 ~ group1, data = dat1, conf.level = 0.99)

# Example 5: Two-Sided 95% CI for y1 by group1
# Newcombes Hybrid Score interval, print results with 3 digits
ci.prop.diff(x1 ~ group1, data = dat1, digits = 3)

# Example 6: Two-Sided 95% CI for y1 by group1
# Newcombes Hybrid Score interval, convert value 0 to NA
ci.prop.diff(x1 ~ group1, data = dat1, as.na = 0)

# Example 7: Two-Sided 95% CI for y1, y2, and y3 by group1
# Newcombes Hybrid Score interval
ci.prop.diff(cbind(x1, x2, x3) ~ group1, data = dat1)

# Example 8: Two-Sided 95% CI for y1, y2, and y3 by group1
# Newcombes Hybrid Score interval, listwise deletion for missing data
ci.prop.diff(cbind(x1, x2, x3) ~ group1, data = dat1, na.omit = TRUE)

# Example 9: Two-Sided 95% CI for y1, y2, and y3 by group1
# Newcombes Hybrid Score interval, analysis by group2 separately
ci.prop.diff(cbind(x1, x2, x3) ~ group1, data = dat1, group = dat1$group2)

# Example 10: Two-Sided 95% CI for y1, y2, and y3 by group1
# Newcombes Hybrid Score interval, analysis by group2 separately, sort by variables
ci.prop.diff(cbind(x1, x2, x3) ~ group1, data = dat1, group = dat1$group2,
             sort.var = TRUE)

# Example 11: Two-Sided 95% CI for y1, y2, and y3 by group1
# split analysis by group2
ci.prop.diff(cbind(x1, x2, x3) ~ group1, data = dat1, split = dat1$group2)

# Example 12: Two-Sided 95% CI for y1, y2, and y3 by group1
# Newcombes Hybrid Score interval, analysis by group2 separately, split analysis by group3
ci.prop.diff(cbind(x1, x2, x3) ~ group1, data = dat1,
             group = dat1$group2, split = dat1$group3)

#-----------------

group1 <- c(0, 1, 1, 0, 0, 1, 0, 1)
group2 <- c(1, 1, 1, 0, 0)

# Example 13: Two-Sided 95% CI for the mean difference between group1 amd group2
# Newcombes Hybrid Score interval
ci.prop.diff(group1, group2)

#-------------------------------------------------------------------------------
# Paires-sample design

dat2 <- data.frame(pre = c(0, 1, 1, 0, 1),
                   post = c(1, 1, 0, 1, 1))

# Example 14: Two-Sided 95% CI for the mean difference in x1 and x2
# Newcombes Hybrid Score interval
ci.prop.diff(dat2$pre, dat2$post, paired = TRUE)

# Example 15: Two-Sided 95% CI for the mean difference in x1 and x2
# Wald CI
ci.prop.diff(dat2$pre, dat2$post, method = "wald", paired = TRUE)

# Example 16: One-Sided 95% CI for the mean difference in x1 and x2
# Newcombes Hybrid Score interval
ci.prop.diff(dat2$pre, dat2$post, alternative = "less", paired = TRUE)

# Example 17: Two-Sided 99% CI for the mean difference in x1 and x2
# Newcombes Hybrid Score interval
ci.prop.diff(dat2$pre, dat2$post, conf.level = 0.99, paired = TRUE)

# Example 18: Two-Sided 95% CI for for the mean difference in x1 and x2
# Newcombes Hybrid Score interval, print results with 3 digits
ci.prop.diff(dat2$pre, dat2$post, paired = TRUE, digits = 3)

Run the code above in your browser using DataLab