Uses the Viterbi algorithm to identify the most likely sequence of underlying genotypes, given the observed multipoint marker data, with possible allowance for genotyping errors.
argmax.geno(cross, step=0, off.end=0, error.prob=0.0001,
map.function=c("haldane","kosambi","c-f","morgan"),
stepwidth=c("fixed", "variable", "max"))
The input cross
object is returned with a component,
argmax
, added to each component of cross$geno
.
The argmax
component is a matrix of size [n.ind x n.pos], where
n.pos is the
number of positions at which the reconstructed genotypes were obtained,
containing the most likely sequences of underlying genotypes.
Attributes "error.prob"
, "step"
, and "off.end"
are set to the values of the corresponding arguments, for later reference.
An object of class cross
. See
read.cross
for details.
Maximum distance (in cM) between positions at which the
genotypes are reconstructed, though for step=0
, genotypes
are reconstructed only at the marker locations.
Distance (in cM) past the terminal markers on each chromosome to which the genotype reconstructions will be carried.
Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype | true genotype).
Indicates whether to use the Haldane, Kosambi, Carter-Falconer or Morgan map function when converting genetic distances into recombination fractions.
Indicates whether the intermediate points should with
fixed or variable step sizes. We recommend using
"fixed"
; "variable"
was included for the qtlbim
package (https://cran.r-project.org/src/contrib/Archive/qtlbim/). The "max"
option inserts the minimal number of intermediate points so that the
maximum distance between points is step
.
The Viterbi algorithm can behave badly when step
is small but
positive. One may observe quite different results for different values
of step
.
The problem is that, in the presence of data like A----H
, the
sequences AAAAAA
and HHHHHH
may be more likely than any
one of the sequences AAAAAH
, AAAAHH
, AAAHHH
,
AAHHHH
, AHHHHH
, AAAAAH
. The Viterbi algorithm
produces a single "most likely" sequence of underlying genotypes.
Karl W Broman, broman@wisc.edu
We use the Viterbi algorithm to calculate \(\arg \max_v \Pr(g = v | O)\) where \(g\) is the underlying sequence of genotypes and \(O\) is the observed marker genotypes.
This is done by calculating \(\gamma_k(v_k) = \max_{v_1, \ldots, v_{k-1}} \Pr(g_1 = v_1, \ldots, g_k = v_k, O_1, \ldots, O_k)\) for \(k = 1, \ldots, n\) and then tracing back through the sequence.
Lange, K. (1999) Numerical analysis for statisticians. Springer-Verlag. Sec 23.3.
Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77, 257--286.
sim.geno
, calc.genoprob
,
fill.geno
data(fake.f2)
fake.f2 <- subset(fake.f2,chr=18:19)
fake.f2 <- argmax.geno(fake.f2, step=2, off.end=5, err=0.01)
Run the code above in your browser using DataLab