Performs evaluation by comparing the distances (or similarities) computed by a DSM with (typically human) word similarity ratings.
Well-know examples are the noun pair ratings collected by Rubenstein & Goodenough (1965; RG65
) and Finkelstein et al. (2002; WordSim353
).
The quality of the DSM predictions is measured by Spearman rank correlation \(rho\).
eval.similarity.correlation(task, M, dist.fnc=pair.distances,
details=FALSE, format=NA, taskname=NA,
word1.name="word1", word2.name="word2", score.name="score",
...)
The default short report (details=FALSE
) is a data frame with a single row and the following columns:
(absolute value of) Spearman rank correlation coefficient \(rho\)
p-value indicating evidence for a significant correlation
number of pairs not included in the DSM
(absolute value of) Pearson correlation coefficient \(r\)
lower bound of confidence interval for Pearson correlation
upper bound of confidence interval for Pearson correlation
The detailed report (details=TRUE
) is a copy of the original task data with two additional columns:
distance calculated by the DSM for each word pair, possibly transformed (numeric)
whether word pair is missing from the DSM (logical)
In addition, the short report is appended to the data frame as an attribute "eval.result"
,
and the optional taskname
value as attribute "taskname"
. The data frame is marked as an
object of class eval.similarity.correlation
, for which suitable print
and plot
methods are defined.
a data frame containing word pairs (usually in columns word1
and word2
) with similarity ratings (usually in column score
); any other columns will be ignored
a scored DSM matrix, passed to dist.fnc
a callback function used to compute distances or similarities between word pairs.
It will be invoked with character vectors containing the components of the word pairs as first and second argument,
the DSM matrix M
as third argument, plus any additional arguments (...
) passed to eval.similarity.correlation
.
The return value must be a numeric vector of appropriate length. If one of the words in a pair is not represented in the DSM,
the corresponding distance value should be set to Inf
(or -Inf
in the case of similarities).
if TRUE
, a detailed report with information on each task item is returned (see Value below for details)
if the task definition specifies POS-disambiguated lemmas in CWB/Penn format, they can automatically be transformed into some other notation conventions; see convert.lemma
for details
optional row label for the short report (details=FALSE
)
any further arguments are passed to dist.fnc
and can be used e.g. to select a distance measure
the name of the column of task
containing the first word of each pair
the name of the column of task
containing the second word of each pair
the name of the column of task
containing the corresponding similarity ratings
Stephanie Evert (https://purl.org/stephanie.evert)
DSM distances are computed for all word pairs and compared with similarity ratings from the gold standard. As an evaluation criterion, Spearman rank correlation between the DSM and gold standard scores is computed. The function also reports a confidence interval for Pearson correlation, which might require suitable transformation to ensure a near-linear relationship in order to be meaningful.
NB: Since the correlation between similarity ratings and DSM distances will usually be negative, the evaluation report omits minus signs on the correlation coefficients.
With the default dist.fnc
, the distance values can optionally be transformed through an arbitrary function specified in the transform
argument (see pair.distances
for details).
Examples include transform=log
(esp. for neighbour rank as a distance measure)
and transform=function (x) 1/(1+x)
(in order to transform distances into similarities).
Note that Spearman rank correlation is not affected by any monotonic transformation, so the main evaluation results
will remain unchanged.
If one or both words of a pair are not found in the DSM, the distance is set to a fixed value 10% above the
maximum of all other DSM distances, or 10% below the minimum in the case of similarity values.
This is done in order to avoid numerical and visualization problems with Inf
values;
the particular value used does not affect the rank correlation coefficient.
With the default dist.fnc
callback, additional arguments method
and p
can be used to select
a distance measure (see dist.matrix
for details); rank=TRUE
can be specified in order to
use neighbour rank as a measure of semantic distance.
Finkelstein, Lev, Gabrilovich, Evgeniy, Matias, Yossi, Rivlin, Ehud, Solan, Zach, Wolfman, Gadi, and Ruppin, Eytan (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116--131.
Rubenstein, Herbert and Goodenough, John B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627--633.
Suitable gold standard data sets in this package: RG65
, WordSim353
Support functions: pair.distances
, convert.lemma
Plotting and printing evaluation results: plot.eval.similarity.correlation
, print.eval.similarity.correlation
eval.similarity.correlation(RG65, DSM_Vectors)
if (FALSE) {
plot(eval.similarity.correlation(RG65, DSM_Vectors, details=TRUE))
}
Run the code above in your browser using DataLab