Compute semantic distances (or similarities) between pairs of words based on a scored DSM matrix M
,
according to any of the distance measures supported by dist.matrix
.
If one of the words in a pair is not represented in the DSM, the distance is set to Inf
(or to -Inf
in the case of a similarity measure).
pair.distances(w1, w2, M, …, rank = c("none", "fwd", "bwd", "avg"), transform = NULL,
avg.method = c("arithmetic", "geometric", "harmonic"),
batchsize = 10e6, verbose = FALSE)
a character vector specifying the first word of each pair
a character vector of the same length as w1
, specifying the second word of each pair
a sparse or dense DSM matrix, suitable for passing to dist.matrix
, or an object of class dsm
further arguments are passed to dist.matrix
and determine the distance or similarity measure to be used (see dist.matrix
for details)
whether to return the distance between the two words ("none"
) or the neighbour rank (see “Details” below)
an optional transformation function applied to the distance, similarity or rank values (e.g. transform=log10
for logarithmic ranks). This option is provided as a convenience for evaluation code that calls pair.distances
with user-specified arguments.
with rank="avg"
, whether to compute the arithmetic, geometric or harmonic mean of forward and backward rank
maximum number of similarity values to compute per batch. This parameter has an essential influence on efficiency and memory use of the algorithm and has to be tuned carefully for optimal performance.
if TRUE
, display some progress messages indicating how data are split into batches
If rank="none"
(the default), a numeric vector of the same length as w1
and w2
specifying the distances or similarities between the word pairs, according to the metric selected with the extra arguments (…
).
Otherwise, an integer or numeric vector of the same length as w1
and w2
specifying
forward, backward or average neighbour rank for the two words.
In either case, a distance of Inf
(or similarity of -Inf
) is returned for any word pair not represented in the DSM.
The rank
argument controls whether semantic distance is measured directly by geometric distance (none
),
by forward neighbour rank (fwd
), by backward neighbour rank (bwd
), or by the average of forward and backward rank (avg
).
Forward neighbour rank is the rank of w2
among the nearest neighbours of w1
.
Backward neighbour rank is the rank of w1
among the nearest neighbours of w2
.
The average can be computed as an arithmetic, geometric or harmonic mean, depending on avg.method
.
Note that a transformation function is applied after averaging.
In order to compute the arithmetic mean of log ranks, set transform=log10
, rank="avg"
and avg.method="geometric"
.
pair.distances
is used as a default callback in several evaluation functions.
dist.matrix
, eval.similarity.correlation
, eval.multiple.choice
# NOT RUN {
transform(RG65, angle=pair.distances(word1, word2, DSM_Vectors))
# }
Run the code above in your browser using DataLab