diseq: Estimate or Compute Confidence Interval for the Single-Marker Disequilibrium

Description

Estimate or compute confidence interval for single-marker disequilibrium.

Usage

diseq(x, ...)
# S3 method for diseq
print(x, show=c("D","D'","r","R^2","table"), ...)
diseq.ci(x, R=1000, conf=0.95, correct=TRUE, na.rm=TRUE, ...)

Arguments

genotype or haplotype object.

show

a character value or vector indicating which disequilibrium measures should be displayed. The default is to show all of the available measures. show="table" will display a table of observed, expected, and observed-expected frequencies.

conf

Confidence level to use when computing the confidence level for D-hat. Defaults to 0.95, should be in (0,1).

Number of bootstrap iterations to use when computing the confidence interval. Defaults to 1000.

correct

See details.

na.rm

logical. Should missing values be removed?

...

optional parameters passed to boot.ci (diseq.ci) or ignored.

Value

diseq returns an object of class diseq with components

callfunction call used to create this object
data2-way table of allele pair counts
D.hatmatrix giving the observed count, expected count, observed - expected difference, and estimate of disequilibrium for each pair of alleles as well as an overall disequilibrium value.
TODOmore slots to be documented

diseq.ci returns an object of class boot.ci

Details

For a single-gene marker, diseq computes the Hardy-Weinberg (dis)equilibrium statistic D, D', r (the correlation coefficient), and $r^2$ for each pair of allele values, as well as an overall summary value for each measure across all alleles. print.diseq displays the contents of a diseq object. diseq.ci computes a bootstrap confidence interval for this estimate.

For consistency, I have applied the standard definitions for D, D', and r from the Linkage Disequilibrium case, replacing all marker probabilities with the appropriate allele probabilities.

Thus, for each allele pair,

D is defined as the half of the raw difference in frequency between the observed number of heterozygotes and the expected number:
$$% D = \frac{1}{2} ( p_{ij} + p_{ji} ) - p_i p_j % $$
D' rescales D to span the range [-1,1]
$$D' = \frac{D}{D_{max} } $$
where, if D > 0: $$% D_{max} = \min{ p_i p_j, p_j p_i } = p_i p_j % $$ or if D < 0: $$% D_{max} = \min{ p_i (1 - p_j), p_j (1 - p_i) } % $$
r is the correlation coefficient between two alleles, and can be computed by
$$% r = \frac{-D}{\sqrt( p_i * (1-p_i) p(j) (1-p_j ) )} % $$

where

- $p_i$ defined as the observed probability of allele 'i',
-$p_j$ defined as the observed probability of allele 'j', and
-$p_{ij}$ defined as the observed probability of the allele pair 'ij'.

When there are more than two alleles, the summary values for these statistics are obtained by computing a weighted average of the absolute value of each allele pair, where the weight is determined by the expected frequency. For example:

$$% D_{overall} = \sum_{i \ne j} |D_{ij}| * p_{ij} % $$

Bootstrapping is used to generate confidence interval in order to avoid reliance on parametric assumptions, which will not hold for alleles with low frequencies (e.g. $D'$ following a a Chi-square distribution).

See the function HWE.test for testing Hardy-Weinberg Equilibrium, $D=0$.

Examples

Run this code

# NOT RUN {
example.data   <- c("D/D","D/I","D/D","I/I","D/D",
                    "D/D","D/D","D/D","I/I","")
g1  <- genotype(example.data)
g1

diseq(g1)
diseq.ci(g1)
HWE.test(g1)  # does the same, plus tests D-hat=0

three.data   <- c(rep("A/A",8),
                  rep("C/A",20),
                  rep("C/T",20),
                  rep("C/C",10),
                  rep("T/T",3))

g3  <- genotype(three.data)
g3

diseq(g3)
diseq.ci(g3, ci.B=10000, ci.type="bca")

# only show observed vs expected table
print(diseq(g3),show='table')

# }

Run the code above in your browser using DataLab