dirmult: Parameter estimation in Dirichlet-multinomial distribution

Description

Consider allele frequencies from different subpopulations. The allele counts, $X$, (or equivalently allele frequencies) are expected to vary between subpopulation. This variability are sometimes refered to as identity-by-decent, but may be modelled as overdispersion due to intra-class correlation $\theta$. The allele counts within each subpopulation is assumed to follow a multinomial distribution conditioned on the allele probabilities, $\pi_1,\dots,\pi_{k-1}$. When $\pi$ follows a Dirichlet distribution the marginal distribution of $X$ is Dirichlet-multinomial with parameters $\pi$ and $\theta$ with density: $$% P(X=x) = {n \choose x} \frac{\prod_{j=1}^k\prod_{r=1}^{x_j}\{\pi_j(1-\theta) + (r-1)\theta\}}% {\prod_{r=1}^{n}\{1-\theta + (r-1)\theta\}}.$$ Using an alternative parameterization the density may be written as: $$% P(X=x) = {n \choose x} \frac{\Gamma(\gamma_+)}{\Gamma(n+\gamma_+)} \prod_{j=1}^k \frac{\Gamma\left(x_j + \gamma_j\right)}% {\Gamma\left(\gamma_j\right)},$$ where $\gamma_+=(1-\theta)/\theta$ and $\gamma_j=\pi_j\theta$.

This formulation second parameterization is used in the iterations since it converges much faster than the original parameterization. The function dirmult estimates the parameters $\gamma$ in the Dirichlet-multinomial distribution and transform these into $\pi_1,\dots,\pi_{k-1}$ and $\theta$.

Usage

dirmult(data, init, initscalar, epsilon=10^(-4), trace=TRUE, mode)

Arguments

data

A matrix or table with counts. Rows represent subpopulations and columns the different categories of the data. Zero rows or columns are automaticly removed.

init

Initial values for the $\gamma$-vector. Default is empty implying the column-proportions are used as initial values.

initscalar

Initial value for $(1-\theta)/\theta$. Default value is (1-MoM)/MoM where MoM a the method of moment estimate.

epsilon

Convergence tolerance. On termination the difference between to succeeding log-likelihoods must be smaller than epsilon.

trace

Logical. If TRUE the parameter estimates and log-likelihood value is printed to the screen after each iteration, otherwise no out-put is produces while iterating.

mode

Takes values "obs" (default) or "exp" determining whether the observed or expected FIM should be used in the Fisher Scoring. All other arguments produces an error message, but the observed FIM is used in the iterations.

Value

Returns a list containing:

loglik

The final log-likelihood value.

ite

Number of iterations used.

gamma

A vector of $\gamma$ estimates.

A vector of $\pi$ estimates.

theta

Estimated $\theta$-value.

Examples

Run this code

# NOT RUN {
data(us)
fit <- dirmult(us[[1]],epsilon=10^(-4),trace=FALSE)
dirmult.summary(us[[1]],fit)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples