analogy: Analogy

Description

Implements the king - man + woman = queen analogy solving algorithm

Usage

analogy(x1,x2,y1=NA,n,tvectors=tvectors)

Value

Returns a list containing a numeric vector and the nearest neighbors to that vector:

In the variant with three input words (x1, x2, and y1), returns:
- y2_vec The result of x2 - x1 + y1 (all normalized to unit norm) as a numeric vector
- y2_neighbors A named numeric vector of the n nearest neighbors to y2_vec. The neighbors are given as names of the vector, and their respective cosines to y2_vec as vector entries.
In the variant with two input words (x1 and x2), returns:
- x_diff_vec The result of x2 - x1 (both normalized to unit norm) as a numeric vector
- x_diff_neighbors A named numeric vector of the n nearest neighbors to x_diff_vec. The neighbors are given as names of the vector, and their respective cosines to x_diff_vec as vector entries.

Arguments

x1: a character vector specifying the first word of the first pair (man in man : king = woman : ?)
x2: a character vector specifying the second word of the first pair (king in man : king = woman : ?)
y1: a character vector specifying the first word of the second pair (woman in man : king = woman : ?)
n: the number of neighbors to be computed
tvectors: the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)

Author

Fritz Guenther

Details

The analogy task is a popular benchmark for vector space models of meaning/word embeddings. It is based on the rationale that proportinal analogies x1 is to x2 as y1 is to y2, like man : king = woman : ? (correct answer: queen), can be solved via the following operation on the respective word vectors (all normalized to unit norm) king - man + woman = queen (that is, the nearest vector to king - man + woman should be queen) (Mikolov et al., 2013).

The analogy() function comes in two variants, taking as input either three words (x1, x2, and y1) or two words (x1 and x2)

The variant with three input words (x1, x2, and y1) implements the standard analogy solving algorithm for analogies of the type x1 : x2 = y1 : ?, searching the n nearest neighbors for x2 - x1 + y1 (all normalized to unit norm) as the best-fitting candidates for y2
The variant with two input words (x1 and x2) only computes the difference between the two vectors (both normalized to unit norm) and the n nearest neighbors to the resulting difference vector

References

Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics.

Examples

Run this code

data(wonderland)

analogy(x1="hatter",x2="mad",y1="cat",n=10,tvectors=wonderland)

analogy(x1="hatter",x2="mad",n=10,tvectors=wonderland)

Run the code above in your browser using DataLab