Learn R Programming

ndl (version 0.2.18)

serbianLex: Serbian lexicon with 1187 prime-target pairs.

Description

The 1187 prime-target pairs and their lexical properties used in the simulation study of Experiment 1 of Baayen et al. (2011).

Usage

data(serbianLex)

Arguments

Format

A data frame with 1187 observations on the following 14 variables:

Target

A factor specifying the target noun form

Prime

A factor specifying the prime noun form

PrimeLemma

A factor specifying the lemma of the prime

TargetLemma

A factor specifying the target lemma

Length

A numeric vector with the length in letters of the target

WeightedRE

A numeric vector with the weighted relative entropy of the prime and target inflectional paradigms

NormLevenshteinDist

A numeric vector with the normalized Levenshtein distance of prime and target forms

TargetLemmaFreq

A numeric vector with log frequency of the target lemma

PrimeSurfFreq

A numeric vector with log frequency of the prime form

PrimeCondition

A factor with prime conditions, levels: DD, DSSD, SS

CosineSim

A numeric vector with the cosine similarity of prime and target vector space semantics

IsMasc

A vector of logicals, TRUE if the noun is masculine.

TargetGender

A factor with the gender of the target, levels: f, m, and n

TargetCase

A factor specifying the case of the target noun, levels: acc, dat, nom

MeanLogObsRT

Mean log-transformed observed reaction time

References

Baayen, R. H., Milin, P., Filipovic Durdevic, D., Hendrix, P. and Marelli, M. (2011), An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118, 438-482.

Examples

Run this code
# NOT RUN {
# calculate the weight matrix for the full set of Serbian nouns
data(serbian)
serbian$Cues <- orthoCoding(serbian$WordForm, grams=2)
serbian$Outcomes <- serbian$LemmaCase
sw <- estimateWeights(cuesOutcomes=serbian)

# calculate the meaning activations for all unique word forms

desiredItems <- unique(serbian["Cues"])
desiredItems$Outcomes <- ""
activations <- estimateActivations(desiredItems, sw)$activationMatrix
rownames(activations) <- unique(serbian[["WordForm"]])
activations <- activations + abs(min(activations))
activations[1:5,1:6]

# calculate simulated latencies for the experimental materials

data(serbianLex)
syntax <- c("acc", "dat", "gen", "ins", "loc", "nom", "Pl", "Sg")
we <- 0.4 # compound cue weight
strengths <- rep(0, nrow(serbianLex))
for(i in 1:nrow(serbianLex)) {
   target <- serbianLex$Target[i]
   prime <- serbianLex$Prime[i]
   targetLemma <- as.character(serbianLex$TargetLemma[i])
   primeLemma <- as.character(serbianLex$PrimeLemma[i])
   targetOutcomes <- c(targetLemma, primeLemma, syntax)
   primeOutcomes <- c(targetLemma, primeLemma, syntax)
   p <- activations[target, targetOutcomes]
   q <- activations[prime, primeOutcomes]
   strengths[i] <- sum((q^we)*(p^(1-we)))
}
serbianLex$SimRT <- -strengths
lengthPenalty <- 0.3
serbianLex$SimRT2 <- serbianLex$SimRT + 
  (lengthPenalty * (serbianLex$Length>5))

cor.test(serbianLex$SimRT, serbianLex$MeanLogObsRT)
cor.test(serbianLex$SimRT2, serbianLex$MeanLogObsRT)

serbianLex.lm <- lm(SimRT2 ~ Length +  WeightedRE*IsMasc + 
      NormLevenshteinDist + TargetLemmaFreq + 
      PrimeSurfFreq + PrimeCondition, data=serbianLex)
summary(serbianLex.lm)
# }

Run the code above in your browser using DataLab