Plot a measure of the proportion of missing information in the genotype data.
plotInfo(x, chr, method=c("entropy","variance","both"), step=1,
off.end=0, error.prob=0.001,
map.function=c("haldane","kosambi","c-f","morgan"),
alternate.chrid=FALSE, fourwaycross=c("all", "AB", "CD"),
include.genofreq=FALSE, ...)
An object with class scanone
: a data.frame with columns the
chromosome IDs and cM positions followed by the entropy and/or
variance version of the missing information.
An object of class cross
. See
read.cross
for details.
Optional vector indicating the chromosomes to plot.
This should be a vector of character strings referring to chromosomes
by name; numeric values are converted to strings. Refer to
chromosomes with a preceding -
to have all chromosomes but
those considered. A logical (TRUE/FALSE) vector may also be used.
Indicates whether to plot the entropy version of the information, the variance version, or both.
Maximum distance (in cM) between positions at which the
missing information is calculated, though for step=0
,
it is are calculated only at the marker locations.
Distance (in cM) past the terminal markers on each chromosome to which the genotype probability calculations will be carried.
Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype | true genotype).
Indicates whether to use the Haldane, Kosambi or Carter-Falconer map function when converting genetic distances into recombination fractions.
If TRUE and more than one chromosome is plotted, alternate the placement of chromosome axis labels, so that they may be more easily distinguished.
For a phase-known four-way cross, measure missing
genotype information overall ("all"
), or just for the alleles
from the first parent ("AB"
) or from the second parent ("CD"
).
If TRUE, estimated genotype frequencies (from
the results of
calc.genoprob
averaged across the individuals) are
included as additional columns in the output.
Passed to plot.scanone
.
Karl W Broman, broman@wisc.edu
The entropy version of the missing information: for a single individual at a single genomic position, we measure the missing information as \(H = \sum_g p_g \log p_g / \log n\), where \(p_g\) is the probability of the genotype \(g\), and \(n\) is the number of possible genotypes, defining \(0 \log 0 = 0\). This takes values between 0 and 1, assuming the value 1 when the genotypes (given the marker data) are equally likely and 0 when the genotypes are completely determined. We calculate the missing information at a particular position as the average of \(H\) across individuals. For an intercross, we don't scale by \(\log n\) but by the entropy in the case of genotype probabilities (1/4, 1/2, 1/4).
The variance version of the missing information: we calculate the average, across individuals, of the variance of the genotype distribution (conditional on the observed marker data) at a particular locus, and scale by the maximum such variance.
Calculations are done in C (for the sake of speed in the presence of
little thought about programming efficiency) and the plot is created
by a call to plot.scanone
.
Note that summary.scanone
may be used to display
the maximum missing information on each chromosome.
plot.scanone
,
plotMissing
, calc.genoprob
,
geno.table
data(hyper)
hyper <- subset(hyper,chr=1:4)
plotInfo(hyper,chr=c(1,4))
# save the results and view maximum missing info on each chr
info <- plotInfo(hyper)
summary(info)
plotInfo(hyper, bandcol="gray70")
Run the code above in your browser using DataLab