Implements the king - man + woman = queen analogy solving algorithm
analogy(x1,x2,y1=NA,n,tvectors=tvectors)
Returns a list containing a numeric vector and the nearest neighbors to that vector:
In the variant with three input words (x1
, x2
, and y1
), returns:
y2_vec
The result of x2 - x1 + y1
(all normalized to unit norm) as a numeric vector
y2_neighbors
A named numeric vector of the n
nearest neighbors to y2_vec
. The neighbors are given as names of the vector, and their respective cosines to y2_vec
as vector entries.
In the variant with two input words (x1
and x2
), returns:
x_diff_vec
The result of x2 - x1
(both normalized to unit norm) as a numeric vector
x_diff_neighbors
A named numeric vector of the n
nearest neighbors to x_diff_vec
. The neighbors are given as names of the vector, and their respective cosines to x_diff_vec
as vector entries.
a character vector specifying the first word of the first pair (man in man : king = woman : ?)
a character vector specifying the second word of the first pair (king in man : king = woman : ?)
a character vector specifying the first word of the second pair (woman in man : king = woman : ?)
the number of neighbors to be computed
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
Fritz Guenther
The analogy task is a popular benchmark for vector space models of meaning/word embeddings.
It is based on the rationale that proportinal analogies x1 is to x2 as y1 is to y2, like man : king = woman : ? (correct answer: queen), can be solved via the following operation on the respective word vectors (all normalized to unit norm) king - man + woman = queen
(that is, the nearest vector to king - man + woman
should be queen
) (Mikolov et al., 2013).
The analogy()
function comes in two variants, taking as input either three words (x1
, x2
, and y1
) or two words (x1
and x2
)
The variant with three input words (x1
, x2
, and y1
) implements the standard analogy solving algorithm for analogies of the type x1 : x2 = y1 : ?
, searching the n
nearest neighbors for x2 - x1 + y1
(all normalized to unit norm) as the best-fitting candidates for y2
The variant with two input words (x1
and x2
) only computes the difference between the two vectors (both normalized to unit norm) and the n
nearest neighbors to the resulting difference vector
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics.
neighbors
data(wonderland)
analogy(x1="hatter",x2="mad",y1="cat",n=10,tvectors=wonderland)
analogy(x1="hatter",x2="mad",n=10,tvectors=wonderland)
Run the code above in your browser using DataLab