prop.cint: Confidence interval for proportion based on frequency counts (corpora)

Description

This function computes a confidence interval for a population proportion from the corresponding frequency count in a sample. It either uses the Clopper-Pearson method (inverted exact binomial test) or the Wilson score method (inversion of a z-score test, with or without continuity correction).

Usage

prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE, p.adjust=FALSE,
          conf.level = 0.95, alternative = c("two.sided", "less", "greater"))

Value

A data frame with two columns, labelled lower for the lower boundary and upper for the upper boundary of the confidence interval. The number of rows is determined by the length of the longest input vector (k, n and conf.level).

Arguments

k: frequency of a type in the corpus (or an integer vector of frequencies)
n: number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples)
method: a character string specifying whether to compute a Clopper-Pearson confidence interval (binomial) or a Wilson score interval (z.score)
correct: if TRUE, apply Yates' continuity correction for the z-score test (default)
p.adjust: if TRUE, apply a Bonferroni correction to ensure a family-wise confidence level over all tests carried out in a single function call (i.e. the length of k). Alternatively, the desired family size can be specified instead of TRUE.
conf.level: the desired confidence level (defaults to 95%)
alternative: a character string specifying the alternative hypothesis, yielding a two-sided (two.sided, default), lower one-sided (less) or upper one-sided (greater) confidence interval

Author

Stephanie Evert (https://purl.org/stephanie.evert)

Details

The confidence intervals computed by this function correspond to those returned by binom.test and prop.test, respectively. However, prop.cint accepts vector arguments, allowing many confidence intervals to be computed with a single function call in a computationally efficient manner.

The Clopper-Pearson confidence interval (binomial) is obtained by inverting the exact binomial test at significance level $\alpha$ = 1 - confidence.level. In the two-sided case, the p-value of the test is computed using the “central” method Fay (2010: 53), i.e. as twice the tail probability of the matching tail. This corresponds to the algorithm originally proposed by Clopper & Pearson (1934).

The limits of the confidence interval are computed in an efficient and numerically robust manner via (the inverse of) the incomplete Beta function.

The Wilscon score confidence interval (z.score) is computed by solving the equation of the z-score test $$% \frac{k - np}{\sqrt{n p (1-p)}} = A $$ for $p$, where $A$ is the $z$-value corresponding to the chosen confidence level (e.g. $\pm 1.96$ for a two-sided test with 95% confidence). This leads to the quadratic equation $$% p^2 (n + A^2) + p (-2k - A^2) + \frac{k^2}{n} = 0 $$ whose two solutions correspond to the lower and upper boundary of the confidence interval.

When Yates' continuity correction is applied, the value $k$ in the numerator of the $z$-score equation has to be replaced by $k^*$, with $k^* = k - 1/2$ for the lower boundary of the confidence interval (where $k > np$) and $k^* = k + 1/2$ for the upper boundary of the confidence interval (where $k < np$). In each case, the corresponding solution of the quadratic equation has to be chosen (i.e., the solution with $k > np$ for the lower boundary and vice versa).

If a Bonferroni correction is applied, the significance level $\alpha$ of the underlying test is divided by the number $m$ of tests carried out (specified explicitly by the user or given implicitly by length(k)): $\alpha' = \alpha / m$.

References

Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4), 404-413.

Fay, Michael P. (2010). Two-sided exact tests and matching confidence intervals for discrete data. The R Journal, 2(1), 53-58.

https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval

Examples

Run this code

# Clopper-Pearson confidence interval
binom.test(19, 100)
prop.cint(19, 100, method="binomial")

# Wilson score confidence interval
prop.test(19, 100)
prop.cint(19, 100, method="z.score")

Run the code above in your browser using DataLab