Compute dissimilarity indices for ecological data matrices. The dissimilarity
indices computed by this function are those described in Legendre and De
Cáceres (2013). In the name of the function, 'ldc' stands for the author's
names. Twelve of these 21 indices are not readily available in other R
package functions; four of them can, however, be computed in two computation
steps in vegan
.
dist.ldc(Y, method = "hellinger", binary = FALSE, samp = TRUE, silent = FALSE)
A dissimilarity matrix, with class dist
.
Community composition data. The object class can be either
data.frame
or matrix
.
One of the 21 dissimilarity coefficients available in the
function: "hellinger"
, "chord"
, "log.chord"
,
"chisquare"
, "profiles"
, "percentdiff"
,
"ruzicka"
, "divergence"
, "canberra"
,
"whittaker"
, "wishart"
, "kulczynski"
,
"jaccard"
, "sorensen"
, "ochiai"
, "ab.jaccard"
,
"ab.sorensen"
, "ab.ochiai"
, "ab.simpson"
,
"euclidean"
, "manhattan"
, "modmeanchardiff"
. See
Details. Names can be abbreviated to a non-ambiguous set of first letters.
Default: method="hellinger"
.
If binary=TRUE
, the data are transformed to
presence-absence form before computation of the dissimilarities. Default
value: binary=FALSE
, except for the Jaccard, Sørensen and Ochiai
indices where binary=TRUE
.
If samp=TRUE
, the abundance-based distances (ab.jaccard,
ab.sorensen, ab.ochiai, ab.simpson) are computed for sample data. If
samp=FALSE
, binary indices are computed for true population data.
If silent=FALSE
, informative messages sent to users will
be printed to the R console. Use silent=TRUE
is called on a
numerical simulation loop, for example.
Pierre Legendre pierre.legendre@umontreal.ca and Naima Madi
The dissimilarities computed by this function are the following. Indices i and k designate two rows (sites) of matrix Y, j designates a column (species). D[ik] is the dissimilarity between rows i and k. p is the number of columns (species) in Y; pp is the number of species present in one or the other site, or in both. y[i+] is the sum of values in row i; same for y[k+]. y[+j] is the sum of values in column j. y[++] is the total sum of values in Y. The indices are computed by functions written in C for greater computation speed with large data matrices.
Group 1 - D computed by transformation of Y followed by Euclidean distance
Hellinger D, D[ik] = sqrt(sum((sqrt(y[ij]/y[i+])-sqrt(y[kj]/y[k+]))^2))
chord D, D[ik] = sqrt(sum((y[ij]/sqrt(sum(y[ij]^2))-y[kj]/sqrt(sum(y[kj]^2)))^2))
log-chord D, D[ik] = chord D[ik] computed on log(y[ij]+1)-transformed data (Legendre and Borcard 2018)
chi-square D, D[ik] = sqrt(y[++] sum((1/j[+j])(y[ij]/y[i+]-y[kj]/y[k+])^2))
species profiles D, D[ik] = sqrt(sum((y[ij]/y[i+]-y[kj]/y[k+])^2))
Group 2 - Other D functions appropriate for beta diversity studies where A = sum(min(y[ij],y[kj])), B = y[i+]-A, C = y[k+]-A
Ružička D, D[ik] = 1-(sum(min(y[ij],y[kj])/sum(max(y[ij],y[kj])) or else, D[ik] = (B+C)/(A+B+C)
coeff. of divergence D, D[ik] = sqrt((1/pp)sum(((y[ij]-y(kj])/(y[ij]+y(kj]))^2))
Canberra metric D, D[ik] = (1/pp)sum(abs(y[ij]-y(kj])/(y[ij]+y(kj]))
Whittaker D, D[ik] = 0.5*sum(abs(y[ij]/y[i+]-y(kj]/y[k+]))
Wishart D, D[ik] = 1-sum(y[ij]y[kj])/(sum(y[ij]^2)+sum(y[kj]^2)-sum(y[ij]y[kj]))
Kulczynski D, D[ik] = 1-0.5((sum(min(y[ij],y[kj])/y[i+]+sum(min(y[ij],y[kj])/y[k+]))
Group 3 - Classical indices for binary data; they are appropriate for beta diversity studies. Value a is the number of species found in both i and k, b is the number of species in site i not found in k, and c is the number of species found in site k but not in i. The D matrices are square-root transformed, as in dist.binary of ade4; the user-oriented reason for this transformation is explained below.
Sørensen D, D[ik] = sqrt((b+c)/(2a+b+c))
Ochiai D, D[ik] = sqrt(1 - a/sqrt((a+b)(a+c)))
Group 4 - Abundance-based indices of Chao et al. (2006) for
quantitative abundance data. These functions correct the index for species
that have not been observed due to sampling errors. For the meaning of the
U and V notations, see Chao et al. (2006, section 3). When
samp=TRUE
, the abundance-based distances (ab.jaccard, ab.sorensen,
ab.ochiai, ab.simpson) are computed for sample data. If samp=FALSE
,
indices are computed for true population data. - Do not use indices of
group 4 with samp=TRUE
on presence-absence data; the indices are not
meant to accommodate this type of data. If samp=FALSE
is used with
presence-absence data, the indices are the regular Jaccard, Sørensen,
Ochiai and Simpson indices. On output, however, the D matrices are not
square-rooted, contrary to the Jaccard, Sørensen and Ochiai indices in
section 3 which are square-rooted.
abundance-based Sørensen D, D[ik] = 1-(2UV/(U+V))
abundance-based Ochiai D, D[ik] = 1-sqrt(UV)
abundance-based Simpson D, D[ik] = 1-(UV/(UV+min((U-UV),(V-UV))))
Group 5 - General-purpose dissimilarities that do not have an upper bound (maximum D value). They are inappropriate for beta diversity studies.
Manhattan D, D[ik] = sum(abs(y[ij] - y[ik]))
modified mean character difference, D[ik] = (1/pp) sum(abs(y[ij] - y[ik]))
The properties of
all dissimilarities available in this function (except Ružička D) were
described and compared in Legendre & De Cáceres (2013), who showed that
most of these dissimilarities are appropriate for beta diversity studies.
Inappropriate are the Euclidean, Manhattan, modified mean character
difference, species profile and chi-square distances. Most of these
dissimilarities have a maximum value of either 1 or sqrt(2). Three
dissimilarities (Euclidean, Manhattan, Modified mean character difference)
do not have an upper bound and are thus inappropriate for beta diversity
studies. The chi-square distance has an upper bound of
sqrt(2*(sum(Y))).
The Euclidean, Hellinger, chord, chi-square and
species profiles dissimilarities have the property of being Euclidean,
meaning that they never produce negative eigenvalues in principal
coordinate analysis. The Canberra, Whittaker, percentage difference,
Wishart and Manhattan coefficients are Euclidean when they are square-root
transformed (Legendre & De Cáceres 2013, Table 2). The distance forms (1-S)
of the Jaccard, Sørensen and Ochiai similarity (S) coefficients are
Euclidean after taking the square root of (1-S) (Legendre & Legendre 2012,
Table 7.2). The D matrices resulting from these three coefficients are
outputted in the form sqrt(1-S), as in function dist.binary
of ade4,
because that form is Euclidean and will thus produce no negative
eigenvalues in principal coordinate analysis.
The Hellinger, chord,
chi-square and species profile dissimilarities are computed using the
two-step procedure developed by Legendre & Gallagher (2001). The data are
first transformed using either the row marginals, or the row and column
marginals in the case of the chi-square distance. The dissimilarities are
then computed from the transformed data using the Euclidean distance
formula. As a consequence, these four dissimilarities are necessarily
Euclidean. D matrices for other binary coefficients can be computed in two
ways: either by using function dist.binary
of ade4, or by choosing
option binary=TRUE
, which transforms the abundance data to binary
form, and using one of the quantitative indices of the present function.
Table 1 of Legendre & De Cáceres (2013) shows the incidence-based
(presence-absence-based) indices computed by the various indices using
binary data.
The Euclidean distance computed on untransformed
presence-absence or abundance data produces non-informative and incorrect
ordinations, as shown in Legendre & Legendre (2012, p. 300) and in Legendre
& De Cáceres (2013). However, the Euclidean distance computed on
log-transformed abundance data produces meaningful ordinations in principal
coordinate analysis (PCoA). Nonetheless, it is easier to compute a PCA of
log-transformed abundance data instead of a PCoA; the resulting ordination
with scaling 1 will be meaningful. Messages are printed to the R console
indicating the Euclidean status of the computed dissimilarity matrices.
Note that for the chi-square distance, the columns that sum to zero are
eliminated before calculation of the distances, thus preventing divisions
by zero in the calculation of the chi-square transformation.
Chao, A., R. L. Chazdon, R. K. Colwell and T. J. Shen. 2006. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62: 361-371.
Legendre, P. and D. Borcard. 2018. Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41: 1820-1824.
Legendre, P. and M. De Cáceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.
Legendre, P. and E. D. Gallagher, E.D. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271-280.
Legendre, P. and Legendre, L. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam.
if(require("vegan", quietly = TRUE)) {
data(mite)
mat1 = as.matrix(mite[1:10, 1:15]) # No column has a sum of 0
mat2 = as.matrix(mite[61:70, 1:15]) # 7 of the 15 columns have a sum of 0
#Example 1: compute Hellinger distance for mat1
D.out = dist.ldc(mat1,"hellinger")
#Example 2: compute chi-square distance for mat2
D.out = dist.ldc(mat2,"chisquare")
#Example 3: compute percentage difference dissimilarity for mat2
D.out = dist.ldc(mat2,"percentdiff")
}
Run the code above in your browser using DataLab