Bruvo's distance between two alleles is calculated as
$$d = 1 - 2^{-\mid x \mid}$$, where x
is the number of repeat units between the two alleles (see the Algorithms
and Equations vignette for more details). These distances are calculated
over all combinations of alleles at a locus and then the minimum average
distance between allele combinations is taken as the distance for that
locus. All loci are then averaged over to obtain the distance between two
samples. Missing data is ignored (in the same fashion as
mean(c(1:9, NA), na.rm = TRUE)
) if all alleles are missing. See the
next section for other cases.
Polyploids
Ploidy is irrelevant with respect to calculation of Bruvo's
distance. However, since it makes a comparison between all alleles at a
locus, it only makes sense that the two loci need to have the same ploidy
level. Unfortunately for polyploids, it's often difficult to fully separate
distinct alleles at each locus, so you end up with genotypes that appear to
have a lower ploidy level than the organism.
To help deal with these situations, Bruvo has suggested three methods for
dealing with these differences in ploidy levels:
Infinite Model - The simplest way to deal with it is to count all
missing alleles as infinitely large so that the distance between it and
anything else is 1. Aside from this being computationally simple, it will
tend to inflate distances between individuals.
Genome Addition Model - If it is suspected that the organism has
gone through a recent genome expansion, the missing alleles will be
replace with all possible combinations of the observed alleles in the
shorter genotype. For example, if there is a genotype of [69, 70, 0, 0]
where 0 is a missing allele, the possible combinations are: [69, 70, 69,
69], [69, 70, 69, 70], [69, 70, 70, 69], and [69, 70, 70, 70]. The
resulting distances are then averaged over the number of comparisons.
Genome Loss Model - This is similar to the genome addition model,
except that it assumes that there was a recent genome reduction event and
uses the observed values in the full genotype to fill the missing
values in the short genotype. As with the Genome Addition Model, the
resulting distances are averaged over the number of comparisons.
Combination Model - Combine and average the genome addition and
loss models.
As mentioned above, the infinite model is biased, but it is not nearly as
computationally intensive as either of the other models. The reason for
this is that both of the addition and loss models requires replacement of
alleles and recalculation of Bruvo's distance. The number of replacements
required is equal to n^k where where n is the number of potential
replacements and k is the number of alleles to be replaced.
To reduce the number of calculations and assumptions otherwise, Bruvo's
distance will be calculated using the largest observed ploidy in pairwise
comparisons. This means that when comparing [69,70,71,0] and [59,60,0,0],
they will be treated as triploids.