Learn R Programming

abdiv (version 0.2.0)

unifrac: UniFrac distance

Description

The UniFrac distance is a phylogenetically-weighted distance between two communities of organisms. The measure has been extended a number of times to include abundance-weighted and variance-adjusted versions.

Usage

unweighted_unifrac(x, y, tree, xy_labels = NULL)

weighted_unifrac(x, y, tree, xy_labels = NULL)

weighted_normalized_unifrac(x, y, tree, xy_labels = NULL)

variance_adjusted_unifrac(x, y, tree, xy_labels = NULL)

generalized_unifrac(x, y, tree, alpha = 0.5, xy_labels = NULL)

information_unifrac(x, y, tree, xy_labels = NULL)

phylosor(x, y, tree, xy_labels = NULL)

Arguments

x, y

Numeric vectors of species counts or proportions.

tree

A phylogenetic tree object.

xy_labels

A character vector of species labels for x and y.

alpha

Generalized UniFrac parameter.

Value

The UniFrac distance between communities x and y. The distance is not defined if either x or y have all zero elements. We return NaN if this is the case.

Details

These functions compute different variations of the UniFrac distance between communities described by the vectors x and y. If the vectors are named, the names will be automatically used to match the vectors with the tree. Missing names are filled in with zero counts. If the vectors are not named and xy_labels is provided, these labels will be used to match the vectors with the tree. If the vectors are not named and xy_labels is not provided, it is assumed that the vectors are already in the correct order, and we simply check that their length matches the number of tips in the tree.

unweighted_unifrac gives the original UniFrac distance from Lozupone and Knight (2005), which is the fraction of total branch length leading to community x or community y, but not both. It is based on species presence/absence.

weighted_unifrac gives the abundance-weighted version of UniFrac proposed by Lozupone et al. (2007). In this measure, the branch lengths of the tree are multiplied by the absolute difference in species abundances below each branch.

weighted_normalized_unifrac provides a normalized version of weighted_unifrac, so the distance is between 0 and 1.

variance_adjusted_unifrac was proposed by Chang et al. (2011) to adjust for the variation of weights in weighted UniFrac under random sampling.

generalized_unifrac was proposed by Chen et al. (2012) to provide a unified mathematical framework for weighted and unweighted UniFrac distance. It includes a parameter, \(\alpha\), which can be used to adjust the abundance-weighting in the distance. A value of \(\alpha = 1\) corresponds to weighted UniFrac. A value of \(\alpha = 0\) corresponds to unweighted UniFrac if presence/absence vectors are provided. The authors suggest a value of \(\alpha = 0.5\) as a compromise between weighted and unweighted distances.

information_unifrac was proposed by Wong et al. (2016) to connect UniFrac distance with compositional data analysis. They also proposed a "ratio UniFrac" distance, which is not yet implemented.

phylosor, proposed by Bryant et al. (2008), is closely related to unweighted UniFrac distance. If unweighted UniFrac distance is the analogue of Jaccard distance using branches on a phylogenetic tree, PhyloSor is the analogue of Sorenson dissimilarity.

References

Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and environmental microbiology. 2005;71(12):8228<U+2013>8235. 10.1128/AEM.71.12.8228-8235.2005

Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative \(\beta\) diversity measures lead to different insights into factors that structure microbial communities. Applied and environmental microbiology. 2007;73(5):1576<U+2013>1585. 10.1128/AEM.01996-06

Chang Q., et al. Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics. 2011;12:118. 10.1186/1471-2105-12-118

Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics. 2012;28(16):2106<U+2013>2113. 10.1093/bioinformatics/bts342

Wong RG, Wu JR, Gloor GB. Expanding the UniFrac Toolbox. PLOS ONE. 2016;11(9):1<U+2013>20. 10.1371/journal.pone.0161196

Bryant JA, Lamanna C, Morlon H, Kerkhoff AJ, Enquist BJ, Green JL. Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proc Natl Acad Sci U S A. 2008;105 Suppl 1:11505-11. 10.1073/pnas.0801920105

Examples

Run this code
# NOT RUN {
# From Lozupone and Knight (2005), Figure 1.
# Panel A
x1 <- c(1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1)
x2 <- c(0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0)
unweighted_unifrac(x1, x2, lozupone_tree)

# Panel B
x3 <- c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
x4 <- c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0)
unweighted_unifrac(x3, x4, lozupone_tree)

# Can use named vectors to specify species
weighted_normalized_unifrac(
  c(A=1, C=1, D=1, F=1, I=1, L=1, N=1),
  c(B=1, E=1, G=1, H=1, J=1, K=1, M=1),
  lozupone_tree)
weighted_normalized_unifrac(x1, x2, lozupone_tree)

# Generalized UniFrac is equal to weighted normalized UniFrac when alpha = 1
generalized_unifrac(x1, x2, lozupone_tree, alpha=1)
generalized_unifrac(x1, x2, lozupone_tree, alpha=0.5)
# }

Run the code above in your browser using DataLab