WordSim353: Similarity Ratings for 351 Noun Pairs (wordspace)

Description

A database of human similarity ratings for 351 English noun pairs, collected by Finkelstein et al. (2002) and annotated with semantic relations (similarity vs. relatedness) by Agirre et al. (2009).

Usage

WordSim353

Arguments

Format

A data frame with 351 rows and the following 6 columns:

word1: first noun (character)
word2: second noun (character)
score: average similarity rating by human judges on scale from 0 to 10 (numeric)
relation: semantic relation between first and second word (factor, see Details below)
similarity: whether word pair belongs to the similarity subset (logical)
relatedness: whether word pair belongs to the relatedness subset (logical)

The nouns are given as disambiguated lemmas in the form <headword>_N.

Details

The data set is known as WordSim353 because it originally consisted of 353 noun pairs. One duplicate entry (money--cash) as well as the trivial combination tiger--tiger (which may have been included as a control item) have been omitted in the present version, however.

The following semantic relations are distinguished in the relation variable: synonym, antonym, hypernym, hyponym, co-hyponym, holonym, meronym and other (topically related or completely unrelated).

Note that the similarity and relatedness subsets are not disjoint, because they share 103 unrelated noun pairs (semantic relation other and score below 5.0).

References

Agirre, Eneko, Alfonseca, Enrique, Hall, Keith, Kravalova, Jana, Pasca, Marius, and Soroa, Aitor (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2009), pages 19--27, Boulder, Colorado.

Finkelstein, Lev, Gabrilovich, Evgeniy, Matias, Yossi, Rivlin, Ehud, Solan, Zach, Wolfman, Gadi, and Ruppin, Eytan (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116--131.

Examples

Run this code

# NOT RUN {
head(WordSim353, 20)

table(WordSim353$relation) # semantic relations

# split into "similarity" and "relatedness" subsets
xtabs(~ similarity + relatedness, data=WordSim353) 

# }

Run the code above in your browser using DataLab