gowdis
measures the Gower (1971) dissimilarity for mixed variables, including asymmetric binary variables. Variable weights can be specified. gowdis
implements Podani's (1999) extension to ordinal variables.
gowdis(x, w, asym.bin = NULL, ord = c("podani", "metric", "classic"))
vector listing the weights for the variables in x
. Can be missing, in which case all variables have equal weights.
vector listing the asymmetric binary variables in x
.
character string specifying the method to be used for ordinal variables (i.e. ordered
). "podani"
refers to Eqs. 2a-b of Podani (1999), while "metric"
refers to his Eq. 3 (see ‘details’); both options convert ordinal variables to ranks. "classic"
simply treats ordinal variables as continuous variables. Can be abbreviated.
an object of class dist
with the following attributes: Labels
, Types
(the variable types, where 'C' is continuous/numeric, 'O' is ordinal, 'B' is symmetric binary, 'A' is asymmetric binary, and 'N' is nominal), Size
, Metric
.
gowdis
computes the Gower (1971) similarity coefficient exactly as described by Podani (1999), then converts it to a dissimilarity coefficient by using \(D = 1 - S\). It integrates variable weights as described by Legendre and Legendre (1998).
Let \(\mathbf{X} = \{x_{ij}\} \) be a matrix containing \(n\) objects (rows) and \(m\) columns (variables). The similarity \(G_{jk}\) between objects \(j\) and \(k\) is computed as
$$G_{jk} = \frac{\sum_{i=1}^{n} w_{ijk} s_{ijk}}{\sum_{i=1}^{n} w_{ijk}}$$,
where \(w_{ijk}\) is the weight of variable \(i\) for the \(j\)-\(k\) pair, and \(s_{ijk}\) is the partial similarity of variable \(i\) for the \(j\)-\(k\) pair,
and where \(w_{ijk} = 0\) if objects \(j\) and \(k\) cannot be compared because \(x_{ij}\) or \(x_{ik}\) is unknown (i.e. NA
).
For binary variables, \(s_{ijk} = 0\) if \(x_{ij} \neq x_{ik}\), and \(s_{ijk} = 1\) if \(x_{ij} = x_{ik} = 1\) or if \(x_{ij} = x_{ik} = 0\).
For asymmetric binary variables, same as above except that \(w_{ijk} = 0\) if \(x_{ij} = x_{ik} = 0\).
For nominal variables, \(s_{ijk} = 0\) if \(x_{ij} \neq x_{ik}\) and \(s_{ijk} = 1\) if \(x_{ij} = x_{ik}\).
For continuous variables,
$$s_{ijk} = 1 - \frac{|x_{ij} - x_{ik}|} {x_{i.max} - x_{i.min}} $$
where \(x_{i.max}\) and \(x_{i.min}\) are the maximum and minimum values of variable \(i\), respectively.
For ordinal variables, when ord = "podani"
or ord = "metric"
, all \(x_{ij}\) are replaced by their ranks \(r_{ij}\) determined over all objects (such that ties are also considered), and then
if ord = "podani"
\(s_{ijk} = 1\) if \(r_{ij} = r_{ik}\), otherwise
$$ s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}| - (T_{ij} - 1)/2 - (T_{ik} - 1)/2 }{r_{i.max} - r_{i.min} - (T_{i.max} - 1)/2 - (T_{i.min}-1)/2 }$$
where \(T_{ij}\) is the number of objects which have the same rank score for variable \(i\) as object \(j\) (including \(j\) itself), \(T_{ik}\) is the number of objects which have the same rank score for variable \(i\) as object \(k\) (including \(k\) itself), \(r_{i.max}\) and \(r_{i.min}\) are the maximum and minimum ranks for variable \(i\), respectively, \(T_{i,max}\) is the number of objects with the maximum rank, and \(T_{i.min}\) is the number of objects with the minimum rank.
if ord = "metric"
$$s_{ijk} = 1 - \frac{|r_{ij} - r_{ik}|}{r_{i.max} - r_{i.min}} $$
When ord = "classic"
, ordinal variables are simply treated as continuous variables.
Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.
Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.
Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.
daisy
is similar but less flexible, since it does not include variable weights and does not treat ordinal variables as described by Podani (1999). Using ord = "classic"
reproduces the behaviour of daisy
.
# NOT RUN {
ex1 <- gowdis(dummy$trait)
ex1
# check attributes
attributes(ex1)
# to include weights
w <- c(4,3,5,1,2,8,3,6)
ex2 <- gowdis(dummy$trait, w)
ex2
# variable 7 as asymmetric binary
ex3 <- gowdis(dummy$trait, asym.bin = 7)
ex3
# example with trait data from New Zealand vascular plant species
ex4 <- gowdis(tussock$trait)
# }
Run the code above in your browser using DataLab