This function computes a confidence interval for a population proportion from the corresponding frequency count in a sample. It either uses the Clopper-Pearson method (inverted exact binomial test) or the Wilson score method (inversion of a z-score test, with or without continuity correction).
prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE, p.adjust=FALSE,
conf.level = 0.95, alternative = c("two.sided", "less", "greater"))
A data frame with two columns, labelled lower
for the lower
boundary and upper
for the upper boundary of the confidence
interval. The number of rows is determined by the length of the
longest input vector (k
, n
and conf.level
).
frequency of a type in the corpus (or an integer vector of frequencies)
number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples)
a character string specifying whether to compute
a Clopper-Pearson confidence interval (binomial
) or
a Wilson score interval (z.score
)
if TRUE
, apply Yates' continuity correction for
the z-score test (default)
if TRUE
, apply a Bonferroni correction to ensure
a family-wise confidence level over all tests carried out in a single
function call (i.e. the length of k
). Alternatively, the desired
family size can be specified instead of TRUE
.
the desired confidence level (defaults to 95%)
a character string specifying the alternative
hypothesis, yielding a two-sided (two.sided
, default), lower
one-sided (less
) or upper one-sided (greater
)
confidence interval
Stephanie Evert (https://purl.org/stephanie.evert)
The confidence intervals computed by this function correspond to those
returned by binom.test
and prop.test
,
respectively. However, prop.cint
accepts vector arguments,
allowing many confidence intervals to be computed with a single
function call in a computationally efficient manner.
The Clopper-Pearson confidence interval (binomial
) is
obtained by inverting the exact binomial test at significance level
\(\alpha\) = 1 - confidence.level
.
In the two-sided case, the p-value of the test is computed using the
“central” method Fay (2010: 53), i.e. as twice the tail probability
of the matching tail. This corresponds to the algorithm originally proposed
by Clopper & Pearson (1934).
The limits of the confidence interval are computed in an efficient and numerically robust manner via (the inverse of) the incomplete Beta function.
The Wilscon score confidence interval (z.score
) is computed
by solving the equation of the z-score test $$%
\frac{k - np}{\sqrt{n p (1-p)}} = A $$
for \(p\), where \(A\) is the \(z\)-value corresponding
to the chosen confidence level (e.g. \(\pm 1.96\) for a
two-sided test with 95% confidence). This leads to the quadratic
equation $$%
p^2 (n + A^2) + p (-2k - A^2) + \frac{k^2}{n} = 0 $$
whose two solutions correspond to the lower and upper boundary of
the confidence interval.
When Yates' continuity correction is applied, the value \(k\) in the numerator of the \(z\)-score equation has to be replaced by \(k^*\), with \(k^* = k - 1/2\) for the lower boundary of the confidence interval (where \(k > np\)) and \(k^* = k + 1/2\) for the upper boundary of the confidence interval (where \(k < np\)). In each case, the corresponding solution of the quadratic equation has to be chosen (i.e., the solution with \(k > np\) for the lower boundary and vice versa).
If a Bonferroni correction is applied, the significance level \(\alpha\)
of the underlying test is divided by the number \(m\) of tests carried out
(specified explicitly by the user or given implicitly by length(k)
):
\(\alpha' = \alpha / m\).
Clopper, C. J. & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4), 404-413.
Fay, Michael P. (2010). Two-sided exact tests and matching confidence intervals for discrete data. The R Journal, 2(1), 53-58.
https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
z.score.pval
, prop.test
,
binom.pval
, binom.test
# Clopper-Pearson confidence interval
binom.test(19, 100)
prop.cint(19, 100, method="binomial")
# Wilson score confidence interval
prop.test(19, 100)
prop.cint(19, 100, method="z.score")
Run the code above in your browser using DataLab