Learn R Programming

wordspace (version 0.2-0)

pair.distances: Semantic Distances Between Word Pairs (wordspace)

Description

Compute semantic distances (or similarities) between pairs of words based on a scored DSM matrix M, according to any of the distance measures supported by dist.matrix. If one of the words in a pair is not represented in the DSM, the distance is set to Inf (or to -Inf in the case of a similarity measure).

Usage

pair.distances(w1, w2, M, …, rank = c("none", "fwd", "bwd", "avg"), transform = NULL,
               avg.method = c("arithmetic", "geometric", "harmonic"),
               batchsize = 10e6, verbose = FALSE)

Arguments

w1

a character vector specifying the first word of each pair

w2

a character vector of the same length as w1, specifying the second word of each pair

M

a sparse or dense DSM matrix, suitable for passing to dist.matrix, or an object of class dsm

further arguments are passed to dist.matrix and determine the distance or similarity measure to be used (see dist.matrix for details)

rank

whether to return the distance between the two words ("none") or the neighbour rank (see “Details” below)

transform

an optional transformation function applied to the distance, similarity or rank values (e.g. transform=log10 for logarithmic ranks). This option is provided as a convenience for evaluation code that calls pair.distances with user-specified arguments.

avg.method

with rank="avg", whether to compute the arithmetic, geometric or harmonic mean of forward and backward rank

batchsize

maximum number of similarity values to compute per batch. This parameter has an essential influence on efficiency and memory use of the algorithm and has to be tuned carefully for optimal performance.

verbose

if TRUE, display some progress messages indicating how data are split into batches

Value

If rank="none" (the default), a numeric vector of the same length as w1 and w2 specifying the distances or similarities between the word pairs, according to the metric selected with the extra arguments ().

Otherwise, an integer or numeric vector of the same length as w1 and w2 specifying forward, backward or average neighbour rank for the two words.

In either case, a distance of Inf (or similarity of -Inf) is returned for any word pair not represented in the DSM.

Details

The rank argument controls whether semantic distance is measured directly by geometric distance (none), by forward neighbour rank (fwd), by backward neighbour rank (bwd), or by the average of forward and backward rank (avg). Forward neighbour rank is the rank of w2 among the nearest neighbours of w1. Backward neighbour rank is the rank of w1 among the nearest neighbours of w2. The average can be computed as an arithmetic, geometric or harmonic mean, depending on avg.method.

Note that a transformation function is applied after averaging. In order to compute the arithmetic mean of log ranks, set transform=log10, rank="avg" and avg.method="geometric".

pair.distances is used as a default callback in several evaluation functions.

See Also

dist.matrix, eval.similarity.correlation, eval.multiple.choice

Examples

Run this code
# NOT RUN {
transform(RG65, angle=pair.distances(word1, word2, DSM_Vectors))

# }

Run the code above in your browser using DataLab