These functions calculate numerical genotype values using posterior
probabilities in a "RADdata"
object, and output
those values as a matrix of taxa by alleles.
GetWeightedMeanGenotypes
returns continuous genotype values,
weighted by posterior genotype probabilities (i.e. posterior mean
genotypes).
GetProbableGenotypes
returns discrete genotype values indicating
the most probable genotype. If the "RADdata"
object includes more than one possible inheritance mode, the
$ploidyChiSq
slot is used for selecting or weighting
inheritance modes for each allele.
GetWeightedMeanGenotypes(object, ...)
# S3 method for RADdata
GetWeightedMeanGenotypes(object, minval = 0, maxval = 1,
omit1allelePerLocus = TRUE,
omitCommonAllele = TRUE,
naIfZeroReads = FALSE,
onePloidyPerAllele = FALSE, ...)GetProbableGenotypes(object, ...)
# S3 method for RADdata
GetProbableGenotypes(object, omit1allelePerLocus = TRUE,
omitCommonAllele = TRUE,
naIfZeroReads = FALSE,
correctParentalGenos = TRUE,
multiallelic = "correct", ...)
For GetWeightedMeanGenotypes
,
a named matrix, with taxa in rows and alleles in columns,
and values ranging from minval
to maxval
.
These values can be treated as continuous genotypes.
For GetProbableGenotypes
, a list:
A named integer matrix, with taxa in rows and alleles in columns, and values ranging from zero to the maximum ploidy for each allele. These values can be treated as discrete genotypes.
A vector with one value per allele. It contains the index
of the most likely inheritance mode of that allele in
object$priorProbPloidies
.
A "RADdata"
object. Posterior genotype probabilities should
have been added with AddGenotypePosteriorProb
, and if there is
more than one possible ploidy,
ploidy chi-squared values should have been added with
AddPloidyChiSq
.
Additional arguments, listed below, to be passed to the method for
"RADdata"
.
The number that should be used for indicating that a taxon has zero copies of an allele.
The number that should be used for indicating that a taxon has the maximum copies of an allele (equal to the ploidy of the locus).
A logical indicating whether one allele per locus should be omitted from the output, in order to reduce the number of variables and prevent singularities for genome-wide association and genomic prediction. The value for one allele can be predicted from the values from all other alleles at its locus.
A logical, passed to the commonAllele
argument of
OneAllelePerMarker
, indicating whether the most common allele
for each locus should be omitted (as opposed to simply the first allele
for each locus). Ignored if omit1allelePerLocus = FALSE
.
A logical indicating whether NA
should be inserted into the
output matrix for any taxa and loci where the total read depth for
the locus is zero. If FALSE
, the output for these genotypes is
essentially calculated using prior genotype probabilities, since
prior and posterior genotype probabilities are equal when there are no
reads.
Logical. If TRUE
, for each allele the inheritance mode with the
lowest \(\chi ^ 2\) value is selected and is assumed to be
the true inheritance mode. If FALSE
, inheritance modes are weighted
by inverse \(\chi ^ 2\) values for each allele, and mean
genotypes that have been weighted across inheritance modes are returned.
Logical. If TRUE
and if the dataset was processed with
PipelineMapping2Parents
, the parental genotypes that are output
are corrected according to the progeny allele frequencies, using the
likelyGeno_donor
and likelyGeno_recurrent
slots in object
.
For the ploidy of the marker, the appropriate ploidy for the parents is
selected using the donorPloidies
and recurrentPloidies
slots.
A string indicating how to handle cases where allele copy number across all
alleles at a locus does not sum to the ploidy. To retain the most probable
copy number for each allele, even if they don't sum to the ploidy across
all alleles, use "ignore"
. To be conservative and convert these allele
copy numbers to NA
, use "na"
. To adjust allele copy numbers to
match the ploidy (maximizing the product of posterior probabilities across
alleles, within the space of possible multiallelic genotypes), use "correct"
.
Lindsay V. Clark
For each inheritance mode \(m\), taxon \(t\), allele \(a\), allele copy number
\(i\), total ploidy \(k\), and posterior genotype probability \(p_{i,t,a,m}\),
posterior mean genotype \(g_{t,a,m}\) is estimated by GetWeightedMeanGenotypes
as:
$$g_{t,a,m} = \sum_{i = 0}^k p_{i,t,a,m} * \frac{i}{k}$$
For GetProbableGenotypes
, the genotype is the one with the maximum posterior
probability:
$$g_{t,a,m} = i | \max_{i = 0}^k{p_{i,t,a,m}}$$
When there are multiple inheritance modes and onePloidyPerAllele = FALSE
,
the weighted genotype is estimated by GetWeightedMeanGenotypes
as:
$$g_{t,a} = \sum_m [ g_{t,a,m} * \frac{1}{\chi^2_{m,a}} / \sum_m \frac{1}{\chi^2_{m,a}}]$$
In GetProbableGenotypes
, or GetWeightedMeanGenotypes
when there are multiple inheritance modes and onePloidyPerAllele = TRUE
,
the genotype is simply the one corresponding to the inheritance mode with the minimum
\(\chi ^2\) value:
$$g_{t,a} = g_{t,a,m} | \min_m{\chi^2_{m,a}}$$
# load dataset
data(exampleRAD_mapping)
# run a genotype calling pipeline;
# substitute with any pipeline and parameters
exampleRAD_mapping <- SetDonorParent(exampleRAD_mapping, "parent1")
exampleRAD_mapping <- SetRecurrentParent(exampleRAD_mapping, "parent2")
exampleRAD_mapping <- PipelineMapping2Parents(exampleRAD_mapping,
n.gen.backcrossing = 1, useLinkage = FALSE)
# get weighted mean genotypes
wmg <- GetWeightedMeanGenotypes(exampleRAD_mapping)
# examine the results
wmg[1:10,]
# get most probable genotypes
pg <- GetProbableGenotypes(exampleRAD_mapping, naIfZeroReads = TRUE)
# examine the results
pg$genotypes[1:10,]
Run the code above in your browser using DataLab