argmax.geno: Reconstruct underlying genotypes

Description

Uses the Viterbi algorithm to identify the most likely sequence of underlying genotypes, given the observed multipoint marker data, with possible allowance for genotyping errors.

Usage

argmax.geno(cross, step=0, off.end=0, error.prob=0.0001,
            map.function=c("haldane","kosambi","c-f","morgan"),
            stepwidth=c("fixed", "variable", "max"))

Value

The input cross object is returned with a component,

argmax, added to each component of cross$geno. The argmax component is a matrix of size [n.ind x n.pos], where n.pos is the number of positions at which the reconstructed genotypes were obtained, containing the most likely sequences of underlying genotypes. Attributes "error.prob", "step", and "off.end"

are set to the values of the corresponding arguments, for later reference.

Arguments

cross: An object of class cross. See read.cross for details.
step: Maximum distance (in cM) between positions at which the genotypes are reconstructed, though for step=0, genotypes are reconstructed only at the marker locations.
off.end: Distance (in cM) past the terminal markers on each chromosome to which the genotype reconstructions will be carried.
error.prob: Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype | true genotype).
map.function: Indicates whether to use the Haldane, Kosambi, Carter-Falconer or Morgan map function when converting genetic distances into recombination fractions.
stepwidth: Indicates whether the intermediate points should with fixed or variable step sizes. We recommend using "fixed"; "variable" was included for the qtlbim package (https://cran.r-project.org/src/contrib/Archive/qtlbim/). The "max" option inserts the minimal number of intermediate points so that the maximum distance between points is step.

Warning

The Viterbi algorithm can behave badly when step is small but positive. One may observe quite different results for different values of step.

The problem is that, in the presence of data like A----H, the sequences AAAAAA and HHHHHH may be more likely than any one of the sequences AAAAAH, AAAAHH, AAAHHH, AAHHHH, AHHHHH, AAAAAH. The Viterbi algorithm produces a single "most likely" sequence of underlying genotypes.

Author

Karl W Broman, broman@wisc.edu

Details

We use the Viterbi algorithm to calculate $\arg \max_v \Pr(g = v | O)$ where $g$ is the underlying sequence of genotypes and $O$ is the observed marker genotypes.

This is done by calculating $\gamma_k(v_k) = \max_{v_1, \ldots, v_{k-1}} \Pr(g_1 = v_1, \ldots, g_k = v_k, O_1, \ldots, O_k)$ for $k = 1, \ldots, n$ and then tracing back through the sequence.

References

Lange, K. (1999) Numerical analysis for statisticians. Springer-Verlag. Sec 23.3.

Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257--286.

Examples

Run this code

data(fake.f2)
fake.f2 <- subset(fake.f2,chr=18:19)
fake.f2 <- argmax.geno(fake.f2, step=2, off.end=5, err=0.01)

Run the code above in your browser using DataLab