ci.p: Confidence interval estimation for the binomial parameter p.

Description

Confidence interval formulae for $\mu$ are only appropriate for quantitative continuous variables. This of course excludes a large number of biologically important variables which describe binary outcomes or counts. The function p.conf calculates confidence intervals for the binomial paramter p (probability of success) using raw or summarized data. By default Wilson point estimators are used to estimate $p$ and $\sigma_{\hat{p}}$. If raw data are to be used (the default) then successes should be indicated as ones and failures as zeroes in the data vector. Finite population corrections can also be specified. Three methods for confidence intervals can be implemented: the normal approximation, Wilson estimators (Wilson 1927), i.e. the adjusted Wald method, and the Clopper-Pearson exact method (Clopper and Pearson 1934). Agresti and Coull (1998) reccomend the Wilson estimation method.

Usage

ci.p(data, conf = 0.95, summarized = FALSE, phat = NULL, S.phat = NULL, 
fpc = FALSE, n = NULL, N = NULL, method="wilson")

Arguments

data

A vector of binary data. Required if summarized = FALSE.

conf

Level of confidence 1 P(type I error).

summarized

Logical; indicate whether raw data or summary stats are to be used.

phat

Estimate of p. Required if summarized = TRUE.

S.phat

Estimate of $\sigma_{\hat{p}}$. Required if summarized = TRUE.

fpc

Logical. Indicates whether finite population corrections should be used. If fpc = TRUE then N must be specified. Finite population corrections are not possible for method = "exact"

Sample size. Required if summarized = TRUE.

Population size. Required if fpc = TRUE.

method

Type of method to be used in confidence interval calculations, method ="wilson" is the default, although there are two other options, method="approximation" provides the conventional normal approximation. methode="exact"

`Value`

Returns a list of class = "ci".  Default printed results are the paramter estimate and confidence bounds.  Other objects are invisible.  In particular, if method = "wilson" or "approximation" returns a list with four items:
p.hatEstimate for p.
S.p.hatEstimate for $S_{\hat{p}}$.
marginConfidence margin.
ciConfidence interval.
If method = "wilson" the function returns the confidence interval, ci, only, i.e. no other invsible components exist.

`Details`

For the binomial distribution the parameter of interest is the probability of success, p.  The parameter, p, and its standard deviation, $\sigma_p$ , can be estimated with: 
$$\hat{p}=\frac{x}{n},$$
$$S_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$
where x is the number of succeses and n is the number of observations.

Agresti and Coull (1998) reported that these estimators can create extremely inaccurate confidence intervals.  As a result Ott and Longnecker (2004) recommend the Wilson estimators for estimation of p and $\sigma_{\hat{p}}$ (Wilson 1927).

$$\hat{p}=\frac{x+2}{n+4},$$
$$S_{\hat{p}}=\sqrt{\frac{\hat{p}(1-\hat{p})}{n+4}}$$

A 100(1 - $\alpha$)percent confidence interval for the binomial parameter p is found using:

$$\hat{p}\pm z_{1-(\alpha/2)}.$$

The "exact" method of Clopper and Pearson (1934) is bounded at the nominal limits, but actual coverage may greatly exceed nominal coverage.  Confidence bounds for the Clopper and Pearson (1934) method are derived (in part) using quantiles from the F-distribution.  

$$C_L=frac{x}(x+(n-x+1)F*_{1-\alpha/2}  F*~F(2n-2x+2,2x)$$
$$C_U=frac{(x+1)F*_{1-\alpha/2}}{n-x+(x+1)F*_{1-\alpha/2}}  F*~F(2x-2,2n-2x).$$

`References`

Agresti, A., and Coull, B . A. (1998) Approximate is better than 'exact' for interval 
estimation of binomial proportions. The American Statistician. 52: 119-126.

Clopper, C. and Pearson, S. (1934) The use of confidence or fiducial limits illustrated in 
the case of the Binomial. Biometrika 26: 404-413.

Ott, R. L., and Longnecker, M. T. (2004) A first course in statistical methods.  
Thompson.

Wilson, E. B.(1927) Probable inference, the law of succession, and statistical inference. 
Journal of the American Statistical Association 22: 209-212.

`See Also`

ci.mu.z, ci.p

`Examples`

Run this code#In 2001, it was estimated that 56,200 Americans would be diagnosed with non-Hodgkin's lymphoma and that 26,300 would die from it (Cernan et al. 2002).  Here we find the 95% confidence interval for the probability of diagnosis \emph{p}. 
ci.p(c(rep(0, 56200-26300),rep(1,26300)))
Run the code above in your browser using DataLab