argmax.geno: Reconstruct underlying genotypes

Description

Uses the Viterbi algorithm to identify the most likely sequence of underlying genotypes, given the observed multipoint marker data, with possible allowance for genotyping errors.

Usage

argmax.geno(cross, step=0, off.end=0, error.prob=0.0001,  map.function=c("haldane","kosambi","c-f","morgan"), stepwidth=c("fixed", "variable", "max"))

Arguments

cross

An object of class cross. See read.cross for details.

step

Maximum distance (in cM) between positions at which the genotypes are reconstructed, though for step=0, genotypes are reconstructed only at the marker locations.

off.end

Distance (in cM) past the terminal markers on each chromosome to which the genotype reconstructions will be carried.

error.prob

Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype | true genotype).

map.function

Indicates whether to use the Haldane, Kosambi, Carter-Falconer or Morgan map function when converting genetic distances into recombination fractions.

stepwidth

Indicates whether the intermediate points should with fixed or variable step sizes. We recommend using "fixed"; "variable" is included for the qtlbim package (http://www.ssg.uab.edu/qtlbim). The "max" option inserts the minimal number of intermediate points so that the maximum distance between points is step.

Value

The input cross object is returned with a component, argmax, added to each component of cross$geno. The argmax component is a matrix of size [n.ind x n.pos], where n.pos is the number of positions at which the reconstructed genotypes were obtained, containing the most likely sequences of underlying genotypes. Attributes "error.prob", "step", and "off.end" are set to the values of the corresponding arguments, for later reference.

Warning

The Viterbi algorithm can behave badly when step is small but positive. One may observe quite different results for different values of step. The problem is that, in the presence of data like A----H, the sequences AAAAAA and HHHHHH may be more likely than any one of the sequences AAAAAH, AAAAHH, AAAHHH, AAHHHH, AHHHHH, AAAAAH. The Viterbi algorithm produces a single "most likely" sequence of underlying genotypes.

Details

We use the Viterbi algorithm to calculate $arg max_v Pr(g = v | O)$ where $g$ is the underlying sequence of genotypes and $O$ is the observed marker genotypes.

This is done by calculating $ Q[k](v[k]) = max{v[1], \ldots, v[k-1]} Pr(g[1] = v[1], \ldots, g[k] = v[k], O[1], \ldots, O[k])$ for $k = 1, \ldots, n$ and then tracing back through the sequence.

References

Lange, K. (1999) Numerical analysis for statisticians. Springer-Verlag. Sec 23.3.

Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257--286.

Examples

Run this code

data(fake.f2)
fake.f2 <- argmax.geno(fake.f2, step=2, off.end=5, err=0.01)

Run the code above in your browser using DataLab