ci.prop.diff: Confidence Interval for the Difference in Proportions

Description

This function computes a confidence interval for the difference in proportions in a two-sample and paired-sample design for one or more variables, optionally by a grouping and/or split variable.

Usage

ci.prop.diff(x, ...)
# S3 method for default
ci.prop.diff(x, y, method = c("wald", "newcombe"), paired = FALSE,
             alternative = c("two.sided", "less", "greater"), conf.level = 0.95,
             group = NULL, split = NULL, sort.var = FALSE, digits = 2,
             as.na = NULL, write = NULL, append = TRUE,
             check = TRUE, output = TRUE, ...)
# S3 method for formula
ci.prop.diff(formula, data, method = c("wald", "newcombe"),
             alternative = c("two.sided", "less", "greater"), conf.level = 0.95,
             group = NULL, split = NULL, sort.var = FALSE, na.omit = FALSE,
             digits = 2, as.na = NULL, write = NULL, append = TRUE,
             check = TRUE, output = TRUE, ...)

Value

Returns an object of class misty.object, which is a list with following entries:

call: function call
type: type of analysis
data: list with the input specified in x, group, and split
args: specification of function arguments
result: result table

Arguments

x: a numeric vector with 0 and 1 values.
...: further arguments to be passed to or from methods.
y: a numeric vector with 0 and 1 values.
method: a character string specifying the method for computing the confidence interval, must be one of "wald", or "newcombe" (default).
paired: logical: if TRUE, confidence interval for the difference of proportions in paired samples is computed.
alternative: a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater" or "less".
conf.level: a numeric value between 0 and 1 indicating the confidence level of the interval.
group: a numeric vector, character vector or factor as grouping variable. Note that a grouping variable can only be used when computing confidence intervals with unknown population standard deviation and population variance.
split: a numeric vector, character vector or factor as split variable. Note that a split variable can only be used when computing confidence intervals with unknown population standard deviation and population variance.
sort.var: logical: if TRUE, output table is sorted by variables when specifying group.
digits: an integer value indicating the number of decimal places to be used.
as.na: a numeric vector indicating user-defined missing values, i.e. these values are converted to NA before conducting the analysis. Note that as.na() function is only applied to x, but not to group or split.
write: a character string naming a text file with file extension ".txt" (e.g., "Output.txt") for writing the output into a text file.
append: logical: if TRUE (default), output will be appended to an existing text file with extension .txt specified in write, if FALSE existing text file will be overwritten.
check: logical: if TRUE (default), argument specification is checked.
output: logical: if TRUE (default), output is shown on the console.
formula: a formula of the form y ~ group for one outcome variable or cbind(y1, y2, y3) ~ group for more than one outcome variable where y is a numeric variable with 0 and 1 values and group a numeric variable, character variable or factor with two values or factor levels giving the corresponding group.
data: a matrix or data frame containing the variables in the formula formula.
na.omit: logical: if TRUE, incomplete cases are removed before conducting the analysis (i.e., listwise deletion) when specifying more than one outcome variable.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

The Wald confidence interval which is based on the normal approximation to the binomial distribution are computed by specifying method = "wald", while the Newcombe Hybrid Score interval (Newcombe, 1998a; Newcombe, 1998b) is requested by specifying method = "newcombe". By default, Newcombe Hybrid Score interval is computed which have been shown to be reliable in small samples (less than n = 30 in each sample) as well as moderate to larger samples(n > 30 in each sample) and with proportions close to 0 or 1, while the Wald confidence intervals does not perform well unless the sample size is large (Fagerland, Lydersen & Laake, 2011).

References

Fagerland, M. W., Lydersen S., & Laake, P. (2011) Recommended confidence intervals for two independent binomial proportions. Statistical Methods in Medical Research, 24, 224-254.

Newcombe, R. G. (1998a). Interval estimation for the difference between independent proportions: Comparison of eleven methods. Statistics in Medicine, 17, 873-890.

Newcombe, R. G. (1998b). Improved confidence intervals for the difference between binomial proportions based on paired data. Statistics in Medicine, 17, 2635-2650.

Rasch, D., Kubinger, K. D., & Yanagida, T. (2011). Statistics in psychology - Using R and SPSS. John Wiley & Sons.

Examples

Run this code

#----------------------------------------------------------------------------
# Two-sample design

# Example 1a: Two-Sided 95% CI for 'vs' by 'am'
# Newcombes Hybrid Score interval
ci.prop.diff(vs ~ am, data = mtcars)

# Example 1b: Two-Sided 95% CI for 'vs' by 'am'
# Wald CI
ci.prop.diff(vs ~ am, data = mtcars, method = "wald")

# Example 1c: Two-Sided 95% CI for the difference in proportions
# Newcombes Hybrid Score interval
ci.prop.diff(c(0, 1, 1, 0, 0, 1, 0, 1), c(1, 1, 1, 0, 0))

#----------------------------------------------------------------------------
# Paired-sample design

dat.p <- data.frame(pre = c(0, 1, 1, 0, 1), post = c(1, 1, 0, 1, 1))

# Example 2a: Two-Sided 95% CI for the difference in proportions 'pre' and 'post'
# Newcombes Hybrid Score interval
ci.prop.diff(dat.p$pre, dat.p$post, paired = TRUE)

# Example 2b: Two-Sided 95% CI for the difference in proportions 'pre' and 'post'
# Wald CI
ci.prop.diff(dat.p$pre, dat.p$post, method = "wald", paired = TRUE)

Run the code above in your browser using DataLab