dictionary: create a dictionary

Description

Create a quanteda dictionary, either from a list or by importing from a foreign format. Currently supported input file formats are the Wordstat and LIWC formats. The import using the LIWC format works with all currently available dictionary files supplied as part of the LIWC 2001, 2007, and 2015 software (see References).

Usage

dictionary(x = NULL, file = NULL, format = NULL, concatenator = " ", toLower = TRUE, encoding = "")

Arguments

a list of character vector dictionary entries, including regular expressions (see examples)

file

file identifier for a foreign dictionary

format

character identifier for the format of the foreign dictionary. Available options are:

"wordstat": format used by Provalis Research's Wordstat software
"LIWC": format used by the Linguistic Inquiry and Word Count software

concatenator

the character in between multi-word dictionary values. This defaults to "_" except LIWC-formatted files, which defaults to a single space " ".

toLower

if TRUE, convert all dictionary values to lowercase

encoding

additional optional encoding value for reading in imported dictionaries. This uses the iconv labels for encoding. See the "Encoding" section of the help for file.

Value

A dictionary class object, essentially a specially classed named list of characters.

References

Wordstat dictionaries page, from Provalis Research http://provalisresearch.com/products/content-analysis-software/wordstat-dictionary/. Pennebaker, J.W., Chung, C.K., Ireland, M., Gonzales, A., & Booth, R.J. (2007). The development and psychometric properties of LIWC2007. [Software manual]. Austin, TX (www.liwc.net).

Examples

Run this code

mycorpus <- subset(inaugCorpus, Year>1900)
mydict <- dictionary(list(christmas = c("Christmas", "Santa", "holiday"),
                          opposition = c("Opposition", "reject", "notincorpus"),
                          taxing = "taxing",
                          taxation = "taxation",
                          taxregex = "tax*",
                          country = "united states"))
head(dfm(mycorpus, dictionary = mydict))

## Not run: 
# # import the Laver-Garry dictionary from http://bit.ly/1FH2nvf
# lgdict <- dictionary(file = "http://www.kenbenoit.net/courses/essex2014qta/LaverGarry.cat",
#                      format = "wordstat")
# head(dfm(inaugTexts, dictionary=lgdict))
# 
# # import a LIWC formatted dictionary from http://www.moralfoundations.org
# mfdict <- dictionary(file = "http://ow.ly/VMRkL", format = "LIWC")
# head(dfm(inaugTexts, dictionary = mfdict))## End(Not run)