Evaluates DSM on a multiple choice task by selecting the answer option closest to the target term in distributional space. A typical example is the TOEFL Synonym Task (Landauer & Dumais 1997).
eval.multiple.choice(task, M, dist.fnc = pair.distances, ...,
details = FALSE, format = NA, taskname = NA,
target.name = "target", correct.name = "correct",
distractor.name = "^distract")
The default short report (details=FALSE
) is a data frame with a single row and the columns
accuracy
(percentage correct), TP
(number of correct answers), FP
(number of wrong answers)
and missing
(number of test items for which the distance between target and correct choice
was not found in the DSM).
The detailed report (details=TRUE
) is a data frame with one row for each task item and the following columns:
the target word (character)
whether model's choice is correct (logical or NA
)
best choice according to the DSM (character)
distance of best choice from target (numeric)
correct answer (numeric)
rank of correct answer among choices (integer)
distance of correct answer from target (numeric)
a data frame listing the target word, the correct answer, and one or more additional choices (distractors) for each test item
a scored DSM matrix, passed to dist.fnc
a callback function used to compute distances between term pairs (or similarity scores, which must be marked with an attribute similarity=TRUE
). See “Details” below for further information.
any further arguments are passed to dist.fnc
and can be used e.g. to select a distance measure
if TRUE
, a detailed report with information on each task item is returned (see “Value” below for details)
if the task definition specifies POS-disambiguated lemmas in CWB/Penn format, they can automatically be transformed into some other notation conventions; see convert.lemma
for details
optional row label for the short report (details=FALSE
)
the name of the column of task
containing the target word
the name of the column of task
containing the correct choice
a regular expression matching columns of task
containing the distractors. The regular expression is matched with perl=TRUE
.
Stephanie Evert (https://purl.org/stephanie.evert)
The callback function dist.fnc
will be invoked with character vectors containing the components of the term pairs as first and second argument,
the DSM matrix M
as third argument, plus any additional arguments (...
) passed to eval.multiple.choice
.
The return value must be a numeric vector of appropriate length. If one of the terms in a pair is not represented in the DSM,
the corresponding distance value should be set to Inf
(or -Inf
in the case of similarity scores).
In most cases, the default callback pair.distances
is sufficient if used with suitable parameter settings.
For each task item, distances between the target word and the possible choices are computed. Then all choices are ranked according to their distances; in the case of a tie, the higher rank is assigned to both words. A task item counts as a TP (true positive, i.e. a successful answer by the DSM) if the correct choice is ranked in first place. Note that if it is tied with another choice, both will be assigned rank 2, so the item does not count as a TP.
If either the target word is missing from the DSM or none of the choices is found in the DSM, the result for this
item is set to NA
, which counts as a FP (false positive) in the accuracy computation.
With the default dist.fnc
callback, additional arguments method
and p
can be used to select
a distance measure (see dist.matrix
for details). It is pointless to specify rank="fwd"
, as
the neighbour ranks produce exactly the same candidate ranking as the distance values.
Landauer, Thomas K. and Dumais, Susan T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychological Review, 104(2), 211--240.
Suitable gold standard data sets in this package: TODO
Support functions: pair.distances
, convert.lemma