setup_lexicons: Set up lexicons (and valence word list) for use in sentiment analysis

Description

Structures provided lexicons and potentially integrates valence words. One can also provide (part of) the built-in lexicons from data("lexicons") or a valence word list from data("valence") as an argument. Makes use of as_key from the sentimentr package to make the output coherent and check for duplicates.

Usage

setup_lexicons(lexiconsIn, valenceIn = NULL, do.split = FALSE)

Arguments

lexiconsIn

a list of (raw) lexicons, each element being a data.table or a data.frame with respectively a words column and a polarity score column. The lexicons should be appropriately named for clarity in terms of subsequently obtained sentiment measures. Alternatively, a subset of the already formatted built-in lexicons accessible via lexicons can be declared too, as part of the same list input. If only (some of) the package built-in lexicons want to be used (with no valence shifters), one can simply supply lexicons[c(...)] as an argument to either sento_measures or compute_sentiment. However, it is strongly recommended to pass all lexicons (and a valence word list) to this function first, in any case.

valenceIn

a single valence word list as a data.table or a data.frame with respectively a words column, a type column (1 for negators, 2 for amplifiers/intensifiers, and 3 for deamplifiers/downtoners) and a score column. Suggested scores are -1, 2, and 0.5 respectively, and should be the same within each type. This argument can also be one of the already formatted built-in valence word lists accessible via valence. If NULL, no valence word list is part of this function's output, nor will it applied in the sentiment analysis.

do.split

a logical that if TRUE splits every lexicon into a separate positive polarity and negative polarity lexicon.

Value

A list with each lexicon as a separate element according to its name, as a data.table, and optionally an element named valence that comprises the valence words. Every x column contains the words, every y column contains the polarity score, and for the valence word list, t contains the word type. If a valence word list is provided, all lexicons are expanded by copying the respective lexicon, and changing the words and scores according to the valence word type: "NOT_" is added for negators, "VERY_" is added for amplifiers and "HARDLY_" is added for deamplifiers. Lexicon scores are multiplied by -1, 2 and 0.5 by default, respectively, or the first value of the scores column of the valence word list.

Examples

Run this code

# NOT RUN {
data("lexicons")
data("valence")

# sets up output list straight from built-in word lists including valence words
l1 <- c(lexicons[c("LM_eng", "HENRY_eng")], valence[["eng"]])

# including a self-made lexicon, with and without valence shifters
lexIn <- c(list(myLexicon = data.table(w = c("nice", "boring"), s = c(2, -1))),
           lexicons[c("GI_eng")])
valIn <- valence[["valence_eng"]]
l2 <- setup_lexicons(lexIn)
l3 <- setup_lexicons(lexIn, valIn)
l4 <- setup_lexicons(lexIn, valIn, do.split = TRUE)

# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples