For a single-gene marker, diseq
computes the Hardy-Weinberg
(dis)equilibrium statistic D, D', r (the correlation coefficient), and
\(r^2\) for each pair of allele values, as well as an overall
summary value for each measure across all alleles. print.diseq
displays the contents of a diseq
object. diseq.ci
computes a bootstrap confidence interval for this estimate.
For consistency, I have applied the standard definitions for D, D',
and r from the Linkage Disequilibrium case, replacing all marker
probabilities with the appropriate allele probabilities.
Thus, for each allele pair,
D is defined as the half of the raw difference
in frequency between
the observed number of heterozygotes and the expected number:
$$%
D = \frac{1}{2} ( p_{ij} + p_{ji} ) - p_i p_j %
$$
D' rescales D to span the range [-1,1]
$$D' = \frac{D}{D_{max} } $$
where, if D > 0:
$$%
D_{max} = \min{ p_i p_j, p_j p_i } = p_i p_j %
$$
or if D < 0:
$$%
D_{max} = \min{ p_i (1 - p_j), p_j (1 - p_i) } %
$$
r is the correlation coefficient between two alleles,
and can be computed by
$$%
r = \frac{-D}{\sqrt( p_i * (1-p_i) p(j) (1-p_j ) )} %
$$
where
- \(p_i\) defined as the observed probability of
allele 'i',
-\(p_j\) defined as the observed probability of
allele 'j', and
-\(p_{ij}\) defined as the observed probability of
the allele pair 'ij'.
When there are more than two alleles, the summary values for these
statistics are obtained by computing a weighted average of the
absolute value of each allele pair, where the weight is determined by
the expected frequency. For example:
$$%
D_{overall} = \sum_{i \ne j} |D_{ij}| * p_{ij} %
$$
Bootstrapping is used to generate confidence interval in order to
avoid reliance on parametric assumptions, which will not hold for
alleles with low frequencies (e.g. \(D'\) following a a Chi-square
distribution).
See the function HWE.test
for testing
Hardy-Weinberg Equilibrium, \(D=0\).