Learn R Programming

quanteda (version 0.9.7-17)

applyDictionary: apply a dictionary or thesarus to an object

Description

Convert features into equivalence classes defined by values of a dictionary object.

Usage

applyDictionary(x, dictionary, ...)
"applyDictionary"(x, dictionary, exclusive = TRUE, valuetype = c("glob", "regex", "fixed"), case_insensitive = TRUE, capkeys = !exclusive, verbose = TRUE, ...)

Arguments

x
object to which dictionary or thesaurus will be supplied
dictionary
the dictionary-class object that will be applied to x
...
not used
exclusive
if TRUE, remove all features not in dictionary, otherwise, replace values in dictionary with keys while leaving other features unaffected
valuetype
how to interpret dictionary values: "glob" for "glob"-style wildcard expressions (the format used in Wordstat and LIWC formatted dictionary values); "regex" for regular expressions; or "fixed" for exact matching (entire words, for instance)
case_insensitive
ignore the case of dictionary values if TRUE
capkeys
if TRUE, convert dictionary keys to uppercase to distinguish them from other features
verbose
print status messages if TRUE

Value

an object of the type passed with the value-matching features replaced by dictionary keys

Examples

Run this code
myDict <- dictionary(list(christmas = c("Christmas", "Santa", "holiday"),
                          opposition = c("Opposition", "reject", "notincorpus"),
                          taxglob = "tax*",
                          taxregex = "tax.+$",
                          country = c("United_States", "Sweden")))
myDfm <- dfm(c("My Christmas was ruined by your opposition tax plan.", 
               "Does the United_States or Sweden have more progressive taxation?"),
             ignoredFeatures = stopwords("english"), verbose = FALSE)
myDfm

# glob format
applyDictionary(myDfm, myDict, valuetype = "glob")
applyDictionary(myDfm, myDict, valuetype = "glob", case_insensitive = FALSE)

# regex v. glob format: note that "united_states" is a regex match for "tax*"
applyDictionary(myDfm, myDict, valuetype = "glob")
applyDictionary(myDfm, myDict, valuetype = "regex", case_insensitive = TRUE)

# fixed format: no pattern matching
applyDictionary(myDfm, myDict, valuetype = "fixed")
applyDictionary(myDfm, myDict, valuetype = "fixed", case_insensitive = FALSE)

Run the code above in your browser using DataLab