dirmul.old: Fitting a Dirichlet-Multinomial Distribution

Description

Fits a Dirichlet-multinomial distribution to a matrix of non-negative integers.

Usage

dirmul.old(link = "loge",
           init.alpha = 0.01, parallel = FALSE, zero = NULL)

Arguments

link

Link function applied to each of the $M$ (positive) shape parameters $\alpha_j$ for $j=1,\ldots,M$. See Links for more choices. Here, $M$ is the number of columns of the response matrix.

init.alpha

Numeric vector. Initial values for the alpha vector. Must be positive. Recycled to length $M$.

parallel

A logical, or formula specifying which terms have equal/unequal coefficients.

zero

An integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. The values must be from the set {1,2,...,$M$}.

Value

An object of class "vglmff" (see vglmff-class). The object is used by modelling functions such as vglm, rrvglm and vgam.

Details

The Dirichlet-multinomial distribution, which is somewhat similar to a Dirichlet distribution, has probability function $$P(Y_1=y_1,\ldots,Y_M=y_M) = {2y_{*} \choose {y_1,\ldots,y_M}} \frac{\Gamma(\alpha_{+})}{\Gamma(2y_{*}+\alpha_{+})} \prod_{j=1}^M \frac{\Gamma(y_j+\alpha_{j})}{\Gamma(\alpha_{j})}$$ for $\alpha_j > 0$, $\alpha_+ = \alpha_1 + \cdots + \alpha_M$, and $2y_{*} = y_1 + \cdots + y_M$. Here, $a \choose b$ means ``$a$ choose $b$'' and refers to combinations (see choose). The (posterior) mean is $$E(Y_j) = (y_j + \alpha_j) / (2y_{*} + \alpha_{+})$$ for $j=1,\ldots,M$, and these are returned as the fitted values as a $M$-column matrix.

References

Lange, K. (2002) Mathematical and Statistical Methods for Genetic Analysis, 2nd ed. New York: Springer-Verlag.

Evans, M., Hastings, N. and Peacock, B. (2000) Statistical Distributions, New York: Wiley-Interscience, Third edition.

Paul, S. R., Balasooriya, U. and Banerjee, T. (2005) Fisher information matrix of the Dirichlet-multinomial distribution. Biometrical Journal, 47, 230--236.

Tvedebrink, T. (2010) Overdispersion in allelic counts and $\theta$-correction in forensic genetics. Theoretical Population Biology, 78, 200--210.

Documentation accompanying the VGAM package at http://www.stat.auckland.ac.nz/~yee contains further information and examples.

Examples

Run this code

# Data from p.50 of Lange (2002)
alleleCounts <- c(2, 84, 59, 41, 53, 131, 2, 0,
       0, 50, 137, 78, 54, 51, 0, 0,
       0, 80, 128, 26, 55, 95, 0, 0,
       0, 16, 40, 8, 68, 14, 7, 1)
dim(alleleCounts) <- c(8, 4)
alleleCounts <- data.frame(t(alleleCounts))
dimnames(alleleCounts) <- list(c("White","Black","Chicano","Asian"),
                    paste("Allele", 5:12, sep = ""))

set.seed(123)  # @initialize uses random numbers
fit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
                  Allele10,Allele11,Allele12) ~ 1, dirmul.old,
             trace = TRUE, crit = "c", data = alleleCounts)

(sfit <- summary(fit))
vcov(sfit)
round(eta2theta(coef(fit), fit@misc$link, fit@misc$earg), dig = 2) # not preferred
round(Coef(fit), dig = 2) # preferred
round(t(fitted(fit)), dig = 4) # 2nd row of Table 3.5 of Lange (2002)
coef(fit, matrix = TRUE)


pfit <- vglm(cbind(Allele5,Allele6,Allele7,Allele8,Allele9,
                   Allele10,Allele11,Allele12) ~ 1,
             dirmul.old(parallel = TRUE), trace = TRUE,
             data = alleleCounts)
round(eta2theta(coef(pfit, matrix = TRUE), pfit@misc$link,
                pfit@misc$earg), dig = 2) # 'Right' answer
round(Coef(pfit), dig = 2) # 'Wrong' answer due to parallelism constraint

Run the code above in your browser using DataLab