betabinomial: Beta-binomial Distribution Family Function

Description

Fits a beta-binomial distribution by maximum likelihood estimation. The two parameters here are the mean and correlation coefficient.

Usage

betabinomial(lmu = "logit", lrho = "logit",
             irho = NULL, imethod = 1, shrinkage.init = 0.95,
             nsimEIM = NULL, zero = 2)

Arguments

lmu, lrho

Link functions applied to the two parameters. See Links for more choices. The defaults ensure the parameters remain in $(0,1)$, however, see the warning below.

irho

Optional initial value for the correlation parameter. If given, it must be in $(0,1)$, and is recyled to the necessary length. Assign this argument a value if a convergence failure occurs. Having irho = NULL means an initial value is ob

imethod

An integer with value 1 or 2 or ..., which specifies the initialization method for $\mu$. If failure to converge occurs try the another value and/or else specify a value for irho.

zero

An integer specifying which linear/additive predictor is to be modelled as an intercept only. If assigned, the single value should be either 1 or 2. The default is to have a single correlation parameter. To model both par

shrinkage.init, nsimEIM

See CommonVGAMffArguments for more information. The argument shrinkage.init is used only if imethod = 2. Using the argument nsimEIM may offer large a

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm.
Suppose fit is a fitted beta-binomial model. Then fit@y contains the sample proportions $y$, fitted(fit) returns estimates of $E(Y)$, and weights(fit, type="prior") returns the number of trials $N$.

Warning

If the estimated rho parameter is close to zero then it pays to try lrho = "rhobit". One day this may become the default link function.

This family function is prone to numerical difficulties due to the expected information matrices not being positive-definite or ill-conditioned over some regions of the parameter space. If problems occur try setting irho to some numerical value, nsimEIM = 100, say, or else use etastart argument of vglm, etc.

Details

There are several parameterizations of the beta-binomial distribution. This family function directly models the mean and correlation parameter, i.e., the probability of success. The model can be written $T|P=p \sim Binomial(N,p)$ where $P$ has a beta distribution with shape parameters $\alpha$ and $\beta$. Here, $N$ is the number of trials (e.g., litter size), $T=NY$ is the number of successes, and $p$ is the probability of a success (e.g., a malformation). That is, $Y$ is the proportion of successes. Like binomialff, the fitted values are the estimated probability of success (i.e., $E[Y]$ and not $E[T]$) and the prior weights $N$ are attached separately on the object in a slot.

The probability function is $$P(T=t) = {N \choose t} \frac{B(\alpha+t, \beta+N-t)} {B(\alpha, \beta)}$$ where $t=0,1,\ldots,N$, and $B$ is the beta function with shape parameters $\alpha$ and $\beta$. Recall $Y = T/N$ is the real response being modelled.

The default model is $\eta_1 = logit(\mu)$ and $\eta_2 = logit(\rho)$ because both parameters lie between 0 and 1. The mean (of $Y$) is $p = \mu = \alpha / (\alpha + \beta)$ and the variance (of $Y$) is $\mu(1-\mu)(1+(N-1)\rho)/N$. Here, the correlation $\rho$ is given by $1/(1 + \alpha + \beta)$ and is the correlation between the $N$ individuals within a litter. A litter effect is typically reflected by a positive value of $\rho$. It is known as the over-dispersion parameter.

This family function uses Fisher scoring. Elements of the second-order expected derivatives with respect to $\alpha$ and $\beta$ are computed numerically, which may fail for large $\alpha$, $\beta$, $N$ or else take a long time.

References

Moore, D. F. and Tsiatis, A. (1991) Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson variation. Biometrics, 47, 383--401.

Prentice, R. L. (1986) Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association, 81, 321--327.

Examples

Run this code

# Example 1
bdata <- data.frame(N = 10, mu = 0.5, rho = 0.8)
bdata <- transform(bdata,
                   y = rbetabinom(n=100, size = N, prob = mu, rho = rho))
fit <- vglm(cbind(y, N-y) ~ 1, betabinomial, bdata, trace = TRUE)
coef(fit, matrix = TRUE)
Coef(fit)
head(cbind(depvar(fit), weights(fit, type = "prior")))


# Example 2
fit <- vglm(cbind(R, N-R) ~ 1, betabinomial, lirat,
            trace = TRUE, subset = N > 1)
coef(fit, matrix = TRUE)
Coef(fit)
t(fitted(fit))
t(depvar(fit))
t(weights(fit, type = "prior"))


# Example 3, which is more complicated
lirat <- transform(lirat, fgrp = factor(grp))
summary(lirat)  # Only 5 litters in group 3
fit2 <- vglm(cbind(R, N-R) ~ fgrp + hb, betabinomial(zero = 2),
             data = lirat, trace = TRUE, subset = N > 1)
coef(fit2, matrix = TRUE)
with(lirat, plot(hb[N > 1], fit2@misc$rho,
                 xlab = "Hemoglobin", ylab = "Estimated rho",
                 pch = as.character(grp[N > 1]), col = grp[N > 1]))
# cf. Figure 3 of Moore and Tsiatis (1991)
with(lirat, plot(hb, R / N, pch = as.character(grp), col = grp, las = 1,
                 xlab = "Hemoglobin level", ylab = "Proportion Dead",
                 main = "Fitted values (lines)"))
smalldf <- with(lirat, lirat[N > 1, ])
for(gp in 1:4) {
  xx <- with(smalldf, hb[grp == gp])
  yy <- with(smalldf, fitted(fit2)[grp == gp])
  ooo <- order(xx)
  lines(xx[ooo], yy[ooo], col = gp) }

Run the code above in your browser using DataLab