Selects the nearest word to an input out of a set of options
MultipleChoice(x,y,tvectors=tvectors,remove.punctuation=TRUE, stopwords = NULL,
method ="Add", all.results=FALSE)
If all.results=FALSE
(default), the function will only return the best answer as a character string. If all.results=TRUE
, it will return a named numeric vector, where the names are the different answer options in y
and the numeric values their respective cosine similarity to x
, sorted by decreasing similarity.
a character vector of length(x) = 1
specifying a sentence/ document (or also a single word)
a character vector specifying multiple answer options (with each element of the vector being one answer option)
the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
removes punctuation from x
and y
; TRUE
by default
a character vector defining a list of words that are not used to compute the document/sentence vector for x
and y
the compositional model to compute the document vector from its word vectors. The default option method = "Add"
computes the document vector as the vector sum. With method = "Multiply"
, the document vector is computed via element-wise multiplication (see compose
and costring
). With method = "Analogy"
, the document vector is computed via vector subtraction; see Description for more information.
If all.results=FALSE
(default), the function will only return the best answer as a character string. If all.results=TRUE
, it will return a named numeric vector, where the names are the different answer options in y
and the numeric values their respective cosine similarity to x
, sorted by decreasing similarity.
Fritz Guenther
Computes all the cosines between a given sentence/document or word and multiple answer options. Then
selects the nearest option to the input (the option with the highest cosine). This function relies entirely on the costring
function.
A note will be displayed whenever not all words of one answer alternative are found in the semantic space. Caution: In that case, the function will still produce a result, by omitting the words not found in the semantic space. Depending on the specific requirements of a task, this may compromise the results. Please check your input when you receive this message.
A warning message will be displayed whenever no word of one answer alternative is found in the semantic space.
Using method="Analogy"
requires the input in both x
and y
to only consist of word pairs (for example x = c("helmet head")
and y = c("kneecap knee", "atmosphere earth", "grass field")
). In that case, the function will try to identify the best-fitting answer in y
by applying the king - man + woman = queen
rationale to solve man : king = woman : ? (Mikolov et al., 2013): In that case, one should also have king - man = queen - woman
. With method="Analogy"
, the function will compute the difference between the normalized vectors head - helmet
, and search the nearest of the vector differences knee - kneecap
, earth - atmosphere
, and field - grass
.
Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.
Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics.
cosine
,
Cosine
,
costring
,
multicostring
,
analogy
data(wonderland)
LSAfun:::MultipleChoice("who does the march hare celebrate his unbirthday with?",
c("mad hatter","red queen","caterpillar","cheshire Cat"),
tvectors=wonderland)
Run the code above in your browser using DataLab