plotInfo: Plot the proportion of missing genotype information

Description

Plot a measure of the proportion of missing information in the genotype data.

Usage

plotInfo(x, chr, method=c("entropy","variance","both"), step=1,
          off.end=0, error.prob=0.001,
          map.function=c("haldane","kosambi","c-f","morgan"),
          alternate.chrid=FALSE, fourwaycross=c("all", "AB", "CD"),
          include.genofreq=FALSE, ...)

Value

An object with class scanone: a data.frame with columns the chromosome IDs and cM positions followed by the entropy and/or variance version of the missing information.

Arguments

x: An object of class cross. See read.cross for details.
chr: Optional vector indicating the chromosomes to plot. This should be a vector of character strings referring to chromosomes by name; numeric values are converted to strings. Refer to chromosomes with a preceding - to have all chromosomes but those considered. A logical (TRUE/FALSE) vector may also be used.
method: Indicates whether to plot the entropy version of the information, the variance version, or both.
step: Maximum distance (in cM) between positions at which the missing information is calculated, though for step=0, it is are calculated only at the marker locations.
off.end: Distance (in cM) past the terminal markers on each chromosome to which the genotype probability calculations will be carried.
error.prob: Assumed genotyping error rate used in the calculation of the penetrance Pr(observed genotype | true genotype).
map.function: Indicates whether to use the Haldane, Kosambi or Carter-Falconer map function when converting genetic distances into recombination fractions.
alternate.chrid: If TRUE and more than one chromosome is plotted, alternate the placement of chromosome axis labels, so that they may be more easily distinguished.
fourwaycross: For a phase-known four-way cross, measure missing genotype information overall ("all"), or just for the alleles from the first parent ("AB") or from the second parent ("CD").
include.genofreq: If TRUE, estimated genotype frequencies (from the results of calc.genoprob averaged across the individuals) are included as additional columns in the output.
...: Passed to plot.scanone.

Author

Karl W Broman, broman@wisc.edu

Details

The entropy version of the missing information: for a single individual at a single genomic position, we measure the missing information as \(H = \sum_g p_g \log p_g / \log n\), where \(p_g\) is the probability of the genotype \(g\), and \(n\) is the number of possible genotypes, defining \(0 \log 0 = 0\). This takes values between 0 and 1, assuming the value 1 when the genotypes (given the marker data) are equally likely and 0 when the genotypes are completely determined. We calculate the missing information at a particular position as the average of \(H\) across individuals. For an intercross, we don't scale by \(\log n\) but by the entropy in the case of genotype probabilities (1/4, 1/2, 1/4).

The variance version of the missing information: we calculate the average, across individuals, of the variance of the genotype distribution (conditional on the observed marker data) at a particular locus, and scale by the maximum such variance.

Calculations are done in C (for the sake of speed in the presence of little thought about programming efficiency) and the plot is created by a call to plot.scanone.

Note that summary.scanone may be used to display the maximum missing information on each chromosome.

Examples

Run this code

data(hyper)
hyper <- subset(hyper,chr=1:4)
plotInfo(hyper,chr=c(1,4))

# save the results and view maximum missing info on each chr
info <- plotInfo(hyper)
summary(info)

plotInfo(hyper, bandcol="gray70")

Run the code above in your browser using DataLab