designdist
.
Gower, Bray--Curtis, Jaccard and
Kulczynski indices are good in detecting underlying
ecological gradients (Faith et al. 1987). Morisita, Horn--Morisita,
Binomial and Chao
indices should be able to handle different sample sizes (Wolda 1981,
Krebs 1999, Anderson & Millar 2004),
and Mountford (1962) and Raup-Crick indices for presence--absence data should
be able to handle unknown (and variable) sample sizes.vegdist(x, method="bray", binary=FALSE, diag=FALSE, upper=FALSE,
na.rm = FALSE, ...)
"manhattan"
,
"euclidean"
, "canberra"
, "bray"
, "kulczynski"
,
"jaccard"
, "gower"
, "altGower"
, "mori
decostand
.method ="gower"
which accepts range.global
parameter of
decostand
. .dist
and
return a distance object of the same type."jaccard"
), Mountford ("mountford"
),
Raup--Crick ("raup"
), Binomial and Chao indices are discussed
later in this section. The function also finds indices for presence/
absence data by setting binary = TRUE
. The following overview
gives first the quantitative version, where $x_{ij}$
$x_{ik}$ refer to the quantity on species (column) $i$
and sites (rows) $j$ and $k$. In binary versions $A$ and
$B$ are the numbers of species on compared sites, and $J$ is
the number of species that occur on both compared sites similarly as
in designdist
(many indices produce identical binary
versions):
euclidean
$d_{jk} = \sqrt{\sum_i (x_{ij}-x_{ik})^2}$
binary: $\sqrt{A+B-2J}$
manhattan
$d_{jk}=\sum_i |x_{ij}-x_{ik}|$
binary: $A+B-2J$
gower
$d_{jk} = (1/M) \sum_i \frac{|x_{ij}-x_{ik}|}{\max x_i-\min
x_i}$
binary: $(A+B-2J)/M$,
where $M$ is the number of columns (excluding missing
values)
altGower
$d_{jk} = (1/NZ) \sum_i |x_{ij} - x_{ik}|$
where $NZ$ is the number of non-zero columns excluding
double-zeros (Anderson et al. 2006).
binary: $\frac{A+B-2J}{A+B-J}$
canberra
$d_{jk}=\frac{1}{NZ} \sum_i
\frac{|x_{ij}-x_{ik}|}{x_{ij}+x_{ik}}$
where $NZ$ is the number of non-zero entries.
binary: $\frac{A+B-2J}{A+B-J}$
bray
$d_{jk} = \frac{\sum_i |x_{ij}-x_{ik}|}{\sum_i (x_{ij}+x_{ik})}$
binary: $\frac{A+B-2J}{A+B}$
kulczynski
$d_{jk} = 1-0.5(\frac{\sum_i \min(x_{ij},x_{ik})}{\sum_i x_{ij}} +
\frac{\sum_i \min(x_{ij},x_{ik})}{\sum_i x_{ik}} )$
binary: $1-(J/A + J/B)/2$
morisita
$d_{jk} = 1 - \frac{2 \sum_i x_{ij} x_{ik}}{(\lambda_j +
\lambda_k) \sum_i x_{ij} \sum_i
x_{ik}}$, where
$\lambda_j = \frac{\sum_i x_{ij} (x_{ij} - 1)}{\sum_i
x_{ij} \sum_i (x_{ij} - 1)}$
binary: cannot be calculated
horn
Like morisita
, but $\lambda_j = \sum_i
x_{ij}^2/(\sum_i x_{ij})^2$
binary: $\frac{A+B-2J}{A+B}$
binomial
$d_{jk} = \sum_i [x_{ij} \log (\frac{x_{ij}}{n_i}) + x_{ik} \log
(\frac{x_{ik}}{n_i}) - n_i \log(\frac{1}{2})]/n_i$,
where $n_i = x_{ij} + x_{ik}$
binary: $\log(2) \times (A+B-2J)$
}Jaccard index is computed as $2B/(1+B)$, where $B$ is Bray--Curtis dissimilarity.
Binomial index is derived from Binomial deviance under null hypothesis
that the two compared communities are equal. It should be able to
handle variable sample sizes. The index does not have a fixed upper
limit, but can vary among sites with no shared species. For further
discussion, see Anderson & Millar (2004).
Mountford index is defined as $M = 1/\alpha$ where $\alpha$
is the parameter of Fisher's logseries assuming that the compared
communities are samples from the same community
(cf. fisherfit
, fisher.alpha
). The index
$M$ is found as the positive root of equation $\exp(aM) +
\exp(bM) = 1 + \exp[(a+b-j)M]$, where $j$ is the number of species occurring in
both communities, and $a$ and $b$ are the number of species
in each separate community (so the index uses presence--absence
information). Mountford index is usually misrepresented in the
literature: indeed Mountford (1962) suggested an approximation to be
used as starting value in iterations, but the proper index is
defined as the root of the equation above. The function
vegdist
solves $M$ with the Newton method. Please note
that if either $a$ or $b$ are equal to $j$, one of the
communities could be a subset of other, and the dissimilarity is
$0$ meaning that non-identical objects may be regarded as
similar and the index is non-metric. The Mountford index is in the
range $0 \dots \log(2)$, but the dissimilarities
are divided by $\log(2)$ so that the results will be in
the conventional range $0 \dots 1$.
Raup--Crick dissimilarity (method = "raup"
) is a
probabilistic index based on presence/absence data. It is defined
as $1 - prob(j)$, or based on the probability of observing at
least $j$ species in shared in compared communities. Legendre &
Legendre (1998) suggest using simulations to assess the probability,
but the current function uses analytic result from hypergeometric
distribution (phyper
) instead. This probability (and
the index) is dependent on the number of species missing in both
sites, and adding all-zero species to the data or removing missing
species from the data will influence the index. The probability
(and the index) may be almost zero or almost one for a wide range of
parameter values. The index is nonmetric: two communities with no
shared species may have a dissimilarity slightly below one, and two
identical communities may have dissimilarity slightly above
zero. Please note that this index does not implement the Raup--Crick
dissimilarity as discussed by Chase et al. (2011): the current index
uses equal probabilities for all species, but the probabilities
should be inequal and based on species frequencies.
Chao index tries to take into account the number of unseen species
pairs, similarly as in method = "chao"
in
specpool
. Function vegdist
implements a Jaccard
type index defined as $d_{jk} = 1 - U_j U_k/(U_j + U_k - U_j
U_k)$, where
$U_j = C_j/N_j + (N_k - 1)/N_k \times a_1/(2 a_2) \times
S_j/N_j$,
and similarly for $U_k$. Here $C_j$ is the total
number of individuals in the species of site $j$ that are shared
with site $k$, $N_j$ is the total number of
individuals at site $j$, $a_1$ (and $a_2$) are
the number of species occurring in site $j$ that have only one
(or two) individuals in site $k$, and $S_j$ is the
total number of individuals in the species present at site $j$
that occur with only one individual in site $k$ (Chao et
al. 2005).
Morisita index can be used with genuine count data (integers) only. Its Horn--Morisita variant is able to handle any abundance data.
Euclidean and Manhattan dissimilarities are not good in gradient separation without proper standardization but are still included for comparison and special needs.
Bray--Curtis and Jaccard indices are rank-order similar, and some
other indices become identical or rank-order similar after some
standardizations, especially with presence/absence transformation of
equalizing site totals with decostand
. Jaccard index is
metric, and probably should be preferred instead of the default
Bray-Curtis which is semimetric.
The naming conventions vary. The one adopted here is traditional
rather than truthful to priority. The function finds either
quantitative or binary variants of the indices under the same name,
which correctly may refer only to one of these alternatives For
instance, the Bray
index is known also as Steinhaus, Czekanowski and "horn"
for the Horn--Morisita index is
misleading, since there is a separate Horn index. The abbreviation
will be changed if that index is implemented in vegan
.
Anderson, M.J., Ellingsen, K.E. & McArdle, B.H. (2006) Multivariate dispersion as a measure of beta diversity. Ecology Letters 9, 683--693.
Chao, A., Chazdon, R. L., Colwell, R. K. and Shen, T. (2005). A new statistical approach for assessing similarity of species composition with incidence and abundance data. Ecology Letters 8, 148--159.
Chase, J.M., Kraft, N.J.B., Smith, K.G., Vellend, M. and Inouye, B.D. (2011). Using null models to disentangle variation in community dissimilarity from variation in $\alpha$-diversity. Ecosphere 2:art24 [doi:10.1890/ES10-00117.1] Faith, D. P, Minchin, P. R. and Belbin, L. (1987). Compositional dissimilarity as a robust measure of ecological distance. Vegetatio 69, 57--68.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics 27, 623--637.
Krebs, C. J. (1999). Ecological Methodology. Addison Wesley Longman.
Legendre, P, & Legendre, L. (1998) Numerical Ecology. 2nd English Edition. Elsevier. Mountford, M. D. (1962). An index of similarity and its application to classification problems. In: P.W.Murphy (ed.), Progress in Soil Zoology, 43--50. Butterworths.
Wolda, H. (1981). Similarity indices, sample size and diversity. Oecologia 50, 296--302.
designdist
can be used for defining your own
dissimilarity index. Alternative dissimilarity functions include
dist
in base R,
daisy
(package dsvdis
(package betadiver
provides indices intended for the analysis of
beta diversity.data(varespec)
vare.dist <- vegdist(varespec)
# Orlóci's Chord distance: range 0 .. sqrt(2)
vare.dist <- vegdist(decostand(varespec, "norm"), "euclidean")
# Anderson et al. (2006) version of Gower
vare.dist <- vegdist(decostand(varespec, "log"), "altGower")
# Range standardization with "altGower" (that excludes double-zeros)
vare.dist <- vegdist(decostand(varespec, "range"), "altGower")
Run the code above in your browser using DataLab