Learn R Programming

quanteda (version 0.9.7-17)

dictionary: create a dictionary

Description

Create a quanteda dictionary, either from a list or by importing from a foreign format. Currently supported input file formats are the Wordstat and LIWC formats. The import using the LIWC format works with all currently available dictionary files supplied as part of the LIWC 2001, 2007, and 2015 software (see References).

Usage

dictionary(x = NULL, file = NULL, format = NULL, concatenator = " ", toLower = TRUE, encoding = "")

Arguments

x
a list of character vector dictionary entries, including regular expressions (see examples)
file
file identifier for a foreign dictionary
format
character identifier for the format of the foreign dictionary. Available options are:
"wordstat"
format used by Provalis Research's Wordstat software
"LIWC"
format used by the Linguistic Inquiry and Word Count software
concatenator
the character in between multi-word dictionary values. This defaults to "_" except LIWC-formatted files, which defaults to a single space " ".
toLower
if TRUE, convert all dictionary values to lowercase
encoding
additional optional encoding value for reading in imported dictionaries. This uses the iconv labels for encoding. See the "Encoding" section of the help for file.

Value

A dictionary class object, essentially a specially classed named list of characters.

References

Wordstat dictionaries page, from Provalis Research http://provalisresearch.com/products/content-analysis-software/wordstat-dictionary/. Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., & Booth, R.J. (2007). The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX (www.liwc.net).

See Also

dfm

Examples

Run this code
mycorpus <- subset(inaugCorpus, Year>1900)
mydict <- dictionary(list(christmas = c("Christmas", "Santa", "holiday"),
                          opposition = c("Opposition", "reject", "notincorpus"),
                          taxing = "taxing",
                          taxation = "taxation",
                          taxregex = "tax*",
                          country = "united states"))
head(dfm(mycorpus, dictionary = mydict))

## Not run: 
# # import the Laver-Garry dictionary from http://bit.ly/1FH2nvf
# lgdict <- dictionary(file = "http://www.kenbenoit.net/courses/essex2014qta/LaverGarry.cat",
#                      format = "wordstat")
# head(dfm(inaugTexts, dictionary=lgdict))
# 
# # import a LIWC formatted dictionary from http://www.moralfoundations.org
# mfdict <- dictionary(file = "http://ow.ly/VMRkL", format = "LIWC")
# head(dfm(inaugTexts, dictionary = mfdict))## End(Not run)

Run the code above in your browser using DataLab