This function computes a confidence interval for a population proportion from the corresponding frequency count in a sample. It either uses the Clopper-Pearson method (inverted exact binomial test) or the Wilson score method (inversion of a z-score test, with or without continuity correction).
prop.cint(k, n, method = c("binomial", "z.score"), correct = TRUE,
conf.level = 0.95, alternative = c("two.sided", "less", "greater"))
frequency of a type in the corpus (or an integer vector of frequencies)
number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples)
a character string specifying whether to compute
a Clopper-Pearson confidence interval (binomial
) or
a Wilson score interval (z.score
) is computed
if TRUE
, apply Yates' continuity correction for
the z-score test (default)
the desired confidence level (defaults to 95%)
a character string specifying the alternative
hypothesis, yielding a two-sided (two.sided
, default), lower
one-sided (less
) or upper one-sided (greater
)
confidence interval
A data frame with two columns, labelled lower
for the lower
boundary and upper
for the upper boundary of the confidence
interval. The number of rows is determined by the length of the
longest input vector (k
, n
and conf.level
).
The confidence intervals computed by this function correspond to those
returned by binom.test
and prop.test
,
respectively. However, prop.cint
accepts vector arguments,
allowing many confidence intervals to be computed with a single
function call. In addition, it uses a fast approximation of the
two-sided binomial test that can safely be applied to large samples.
The confidence interval for a z-score test is computed by solving the z-score equation $$% \frac{k - np}{\sqrt{n p (1-p)}} = \alpha $$ for \(p\), where \(\alpha\) is the \(z\)-value corresponding to the chosen confidence level (e.g. \(\pm 1.96\) for a two-sided test with 95% confidence). This leads to the quadratic equation $$% p^2 (n + \alpha^2) + p (-2k - \alpha^2) + \frac{k^2}{n} = 0 $$ whose two solutions correspond to the lower and upper boundary of the confidence interval.
When Yates' continuity correction is applied, the value \(k\) in the numerator of the \(z\)-score equation has to be replaced by \(k^*\), with \(k^* = k - 1/2\) for the lower boundary of the confidence interval (where \(k > np\)) and \(k^* = k + 1/2\) for the upper boundary of the confidence interval (where \(k < np\)). In each case, the corresponding solution of the quadratic equation has to be chosen (i.e., the solution with \(k > np\) for the lower boundary and vice versa).
http://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval