negDistMat
creates a square matrix of mutual
pairwise similarities of data vectors as negative distances. The
argument r
(default is 1) is used to transform the resulting
distances by computing the r-th power (use r=2
to obtain
negative squared distances as in Frey's and Dueck's demos), i.e.,
given a distance d, the resulting similarity is computed as
\(s=-d^r\). With the parameter sel
a subset of samples
can be specified for distance calculation. In this case not the
full distance matrix is computed but a rectangular similarity matrix
of all samples (rows) against the subset (cols) as needed for
leveraged clustering. Internally, the computation of distances is
done using an internal method derived from
dist
. All options of this function except
diag
and upper
can be used, especially method
which allows for selecting different distance measures.
Note that, since version 1.4.4. of the package, there is an additional
method "discrepancy"
that implements Weyl's discrepancy measure.
expSimMat
computes similarities in a way similar to
negDistMat
, but the transformation of distances to similarities
is done in the following way:
$$s=\exp\left(-\left(\frac{d}{w}\right)^r\right)$$
The parameter sel
allows the creation of a rectangular
similarity matrix. As above, r is an exponent. The parameter w controls
the speed of descent. r=2
in conjunction with Euclidean
distances corresponds to the well-known Gaussian/RBF kernel,
whereas r=1
corresponds to the Laplace kernel. Note that these
similarity measures can also be understood as fuzzy equality relations.
linSimMat
provides another way of transforming distances
into similarities by applying the following transformation to a
distance d:
$$s=\max\left(0,1-\frac{d}{w}\right)$$
Thw parameter sel
is used again for creation of a rectangular
similarity matrix. Here w
corresponds to a maximal radius of
interest. Note that this is a fuzzy equality relation with respect to
the Lukasiewicz t-norm.
Unlike the above three functions, linKernel
computes pairwise
similarities as scalar products of data vectors, i.e. it corresponds,
as the name suggests, to the “linear kernel”. Use parameter
sel
to compute only a submatrix of the full kernel matrix as
described above. If normalize=TRUE
, the values are scaled to
the unit sphere in the following way (for two samples x
and
y
:
$$s=\frac{\vec{x}^T\vec{y}}{\|\vec{x}\| \|\vec{y}\|}$$
The function corSimMat
computes pairwise similarities as
correlations. It uses link[stats:cor]{cor}
internally.
The method
argument is passed on to link[stats:cor]{cor}
.
The argument r
serves as an exponent with which the correlations
can be transformed. If signed=TRUE
(default), negative correlations are
taken into account, i.e. two samples are maximally dissimilar if they
are negatively correlated. If signed=FALSE
, similarities are
computed as absolute values of correlations, i.e. two samples are
maximally similar if they are positively or negatively correlated and
the two samples are maximally dissimilar if they are uncorrelated.
Note that the naming of the argument p
has been chosen for
consistency with dist
and previous versions
of the package. When using leveraged AP in
conjunction with the Minkowski distance, this leads to conflicts with
the input preference parameter p
of
apclusterL
. In order to avoid that, use the above
functions without x
argument to create a custom similarity
measure with fixed parameter p
(see example below).