GetWeightedMeanGenotypes: Export Numeric Genotype Values from Posterior Probabilities

Description

These functions calculate numerical genotype values using posterior probabilities in a "RADdata" object, and output those values as a matrix of taxa by alleles. GetWeightedMeanGenotypes returns continuous genotype values, weighted by posterior genotype probabilities (i.e. posterior mean genotypes). GetProbableGenotypes returns discrete genotype values indicating the most probable genotype. If the "RADdata" object includes more than one possible inheritance mode, the $ploidyChiSq slot is used for selecting or weighting inheritance modes for each allele.

Usage

GetWeightedMeanGenotypes(object, ...)
# S3 method for RADdata
GetWeightedMeanGenotypes(object, minval = 0, maxval = 1, 
                         omit1allelePerLocus = TRUE, 
                         omitCommonAllele = TRUE,
                         naIfZeroReads = FALSE, 
                         onePloidyPerAllele = FALSE, ...)
GetProbableGenotypes(object, ...)
# S3 method for RADdata
GetProbableGenotypes(object, omit1allelePerLocus = TRUE,
                     omitCommonAllele = TRUE,
                     naIfZeroReads = FALSE, 
                     correctParentalGenos = TRUE,
                     multiallelic = "correct", ...)

Value

For GetWeightedMeanGenotypes, a named matrix, with taxa in rows and alleles in columns, and values ranging from minval to maxval. These values can be treated as continuous genotypes.

For GetProbableGenotypes, a list:

genotypes: A named integer matrix, with taxa in rows and alleles in columns, and values ranging from zero to the maximum ploidy for each allele. These values can be treated as discrete genotypes.
ploidy_index: A vector with one value per allele. It contains the index of the most likely inheritance mode of that allele in object$priorProbPloidies.

Arguments

object: A "RADdata" object. Posterior genotype probabilities should have been added with AddGenotypePosteriorProb, and if there is more than one possible ploidy, ploidy chi-squared values should have been added with AddPloidyChiSq.
...: Additional arguments, listed below, to be passed to the method for "RADdata".
minval: The number that should be used for indicating that a taxon has zero copies of an allele.
maxval: The number that should be used for indicating that a taxon has the maximum copies of an allele (equal to the ploidy of the locus).
omit1allelePerLocus: A logical indicating whether one allele per locus should be omitted from the output, in order to reduce the number of variables and prevent singularities for genome-wide association and genomic prediction. The value for one allele can be predicted from the values from all other alleles at its locus.
omitCommonAllele: A logical, passed to the commonAllele argument of OneAllelePerMarker, indicating whether the most common allele for each locus should be omitted (as opposed to simply the first allele for each locus). Ignored if omit1allelePerLocus = FALSE.
naIfZeroReads: A logical indicating whether NA should be inserted into the output matrix for any taxa and loci where the total read depth for the locus is zero. If FALSE, the output for these genotypes is essentially calculated using prior genotype probabilities, since prior and posterior genotype probabilities are equal when there are no reads.
onePloidyPerAllele: Logical. If TRUE, for each allele the inheritance mode with the lowest $\chi ^ 2$ value is selected and is assumed to be the true inheritance mode. If FALSE, inheritance modes are weighted by inverse $\chi ^ 2$ values for each allele, and mean genotypes that have been weighted across inheritance modes are returned.
correctParentalGenos: Logical. If TRUE and if the dataset was processed with PipelineMapping2Parents, the parental genotypes that are output are corrected according to the progeny allele frequencies, using the likelyGeno_donor and likelyGeno_recurrent slots in object. For the ploidy of the marker, the appropriate ploidy for the parents is selected using the donorPloidies and recurrentPloidies slots.
multiallelic: A string indicating how to handle cases where allele copy number across all alleles at a locus does not sum to the ploidy. To retain the most probable copy number for each allele, even if they don't sum to the ploidy across all alleles, use "ignore". To be conservative and convert these allele copy numbers to NA, use "na". To adjust allele copy numbers to match the ploidy (maximizing the product of posterior probabilities across alleles, within the space of possible multiallelic genotypes), use "correct".

Author

Lindsay V. Clark

Details

For each inheritance mode $m$, taxon $t$, allele $a$, allele copy number $i$, total ploidy $k$, and posterior genotype probability $p_{i,t,a,m}$, posterior mean genotype $g_{t,a,m}$ is estimated by GetWeightedMeanGenotypes as:

$$g_{t,a,m} = \sum_{i = 0}^k p_{i,t,a,m} * \frac{i}{k}$$

For GetProbableGenotypes, the genotype is the one with the maximum posterior probability:

$$g_{t,a,m} = i | \max_{i = 0}^k{p_{i,t,a,m}}$$

When there are multiple inheritance modes and onePloidyPerAllele = FALSE, the weighted genotype is estimated by GetWeightedMeanGenotypes as:

$$g_{t,a} = \sum_m [ g_{t,a,m} * \frac{1}{\chi^2_{m,a}} / \sum_m \frac{1}{\chi^2_{m,a}}]$$

In GetProbableGenotypes, or GetWeightedMeanGenotypes when there are multiple inheritance modes and onePloidyPerAllele = TRUE, the genotype is simply the one corresponding to the inheritance mode with the minimum $\chi ^2$ value:

$$g_{t,a} = g_{t,a,m} | \min_m{\chi^2_{m,a}}$$

Examples

Run this code

# load dataset
data(exampleRAD_mapping)

# run a genotype calling pipeline; 
# substitute with any pipeline and parameters
exampleRAD_mapping <- SetDonorParent(exampleRAD_mapping, "parent1")
exampleRAD_mapping <- SetRecurrentParent(exampleRAD_mapping, "parent2")
exampleRAD_mapping <- PipelineMapping2Parents(exampleRAD_mapping,
                                 n.gen.backcrossing = 1, useLinkage = FALSE)


# get weighted mean genotypes
wmg <- GetWeightedMeanGenotypes(exampleRAD_mapping)
# examine the results
wmg[1:10,]

# get most probable genotypes
pg <- GetProbableGenotypes(exampleRAD_mapping, naIfZeroReads = TRUE)
# examine the results
pg$genotypes[1:10,]

Run the code above in your browser using DataLab