varbin: Estimate of a Probability from Clustered Binomial Data

Description

The function estimates a probability and its variance from clustered binomial data

{\((n_1, m_1), (n_2, m_2), ..., (n_N, m_N)\)},

where \(n_i\) is the size of cluster \(i\), \(m_i\) the number of “successes” (proportions are \(y = m/n\)), and \(N\) the number of clusters. Confidence intervals are calculated using a normal approximation, which might be inappropriate when the probability is close to 0 or 1.

Usage

varbin(n, m, alpha = 0.05, R = 5000)
  
  # S3 method for varbin
print(x, ...)

Value

An object of class varbin, printed with print.varbin.

Arguments

n: A vector of the sizes of the clusters.
m: A vector of the numbers of successes (proportions are eqny = m / n).
alpha: The significance level for the confidence intervals. Default to 0.05, providing 95% CI's.
R: The number of bootstrap replicates to compute bootstrap mean and variance. Default to 5000.
x: An object of class “varbin”.
...: Further arguments to be passed to “print”.

Details

Five methods are used for the estimations. Let us consider \(N\) clusters of sizes \(n_1, \ldots, n_N\) with observed count responses \(m_1, \ldots, m_N\). We note \(y_i = m_i/n_i (i = 1, \ldots, N)\) the observed proportions. The underlying assumption is that the probability, say \(mu\), is homogeneous across the clusters.

Binomial method: the probability estimate and its variance are calculated by

\(\mu = (sum_{i} (m_i)) / (sum_{i} (n_i))\) (ratio estimate) and

\(\mu * (1 - \mu) / (sum_{i} (n_i) - 1)\), respectively.

Ratio method: the probability \(\mu\) is estimated as for the binomial method (ratio estimate). The one-stage cluster sampling formula is used to calculate the variance of \(\mu\) (see Cochran, 1999, p. 32 and p. 66).

Arithmetic method: the probability is estimated by \(\mu = sum_{i} (y_i) / N\). The variance of \(\mu\) is estimated by \(sum_{i} (y_i - \mu)^2 / (N * (N - 1))\).

Jackknife method: the probability is estimated by \(\mu\) defined by the arithmetic mean of the pseudovalues \(y_{v,i}\). The variance is estimated by \(sum_{i} (y_{v,i} - \mu)^2 / (N * (N - 1))\) (Gladen, 1977, Paul, 1982).

Bootstrap method: \(R\) samples of clusters of size \(N\) are drawn with equal probability from the initial sample \((y_1, \ldots , y_N)\) (Efron and Tibshirani, 1993). The bootstrap estimate \(\mu\) and its estimated variance are the arithmetic mean and the empirical variance (computed with denominator \(R - 1\)) of the \(R\) binomial ratio estimates, respectively.

References

Cochran, W.G., 1999, 3th ed. Sampling techniques. Wiley, New York.
Efron, B., Tibshirani, R., 1993. An introduction to the bootstrap. Chapman and Hall, London.
Gladen, B., 1977. The use of the jackknife to estimate proportions from toxicological data in the presence of litter effects. JASA 74(366), 278-283.
Paul, S.R., 1982. Analysis of proportions of affected foetuses in teratological experiments. Biometrics 38, 361-370.

Examples

Run this code

data(rabbits)
z <- rabbits[rabbits$group == "M", ]
varbin(z$n, z$m)
by(rabbits,
	list(group = rabbits$group),
  function(x) varbin(n = x$n, m = x$m, R = 1000))

Run the code above in your browser using DataLab