MultipleChoice: Answers Multiple Choice Questions

Description

Selects the nearest word to an input out of a set of options

Usage

MultipleChoice(x,y,tvectors=tvectors,remove.punctuation=TRUE, stopwords = NULL,
   method ="Add", all.results=FALSE)

Value

If all.results=FALSE (default), the function will only return the best answer as a character string. If all.results=TRUE, it will return a named numeric vector, where the names are the different answer options in y and the numeric values their respective cosine similarity to x, sorted by decreasing similarity.

Arguments

x: a character vector of length(x) = 1 specifying a sentence/ document (or also a single word)
y: a character vector specifying multiple answer options (with each element of the vector being one answer option)
tvectors: the semantic space in which the computation is to be done (a numeric matrix where every row is a word vector)
remove.punctuation: removes punctuation from x and y; TRUE by default
stopwords: a character vector defining a list of words that are not used to compute the document/sentence vector for x and y
method: the compositional model to compute the document vector from its word vectors. The default option method = "Add" computes the document vector as the vector sum. With method = "Multiply", the document vector is computed via element-wise multiplication (see compose and costring). With method = "Analogy", the document vector is computed via vector subtraction; see Description for more information.
all.results: If all.results=FALSE (default), the function will only return the best answer as a character string. If all.results=TRUE, it will return a named numeric vector, where the names are the different answer options in y and the numeric values their respective cosine similarity to x, sorted by decreasing similarity.

Author

Fritz Guenther

Details

Computes all the cosines between a given sentence/document or word and multiple answer options. Then selects the nearest option to the input (the option with the highest cosine). This function relies entirely on the costring function.

A note will be displayed whenever not all words of one answer alternative are found in the semantic space. Caution: In that case, the function will still produce a result, by omitting the words not found in the semantic space. Depending on the specific requirements of a task, this may compromise the results. Please check your input when you receive this message.

A warning message will be displayed whenever no word of one answer alternative is found in the semantic space.

Using method="Analogy" requires the input in both x and y to only consist of word pairs (for example x = c("helmet head") and y = c("kneecap knee", "atmosphere earth", "grass field")). In that case, the function will try to identify the best-fitting answer in y by applying the king - man + woman = queen rationale to solve man : king = woman : ? (Mikolov et al., 2013): In that case, one should also have king - man = queen - woman. With method="Analogy", the function will compute the difference between the normalized vectors head - helmet, and search the nearest of the vector differences knee - kneecap, earth - atmosphere, and field - grass.

References

Landauer, T.K., & Dumais, S.T. (1997). A solution to Plato's problem: The Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104, 211-240.

Mikolov, T., Yih, W. T., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT-2013). Association for Computational Linguistics.

Examples

Run this code

data(wonderland)

LSAfun:::MultipleChoice("who does the march hare celebrate his unbirthday with?",
                 c("mad hatter","red queen","caterpillar","cheshire Cat"),
                 tvectors=wonderland)

Run the code above in your browser using DataLab