sento_lexicons: Set up lexicons (and valence word list) for use in sentiment analysis

Description

Structures provided lexicon(s) and optionally valence words. One can for example combine (part of) the built-in lexicons from data("list_lexicons") with other lexicons, and add one of the built-in valence word lists from data("list_valence_shifters"). This function makes the output coherent, by converting all words to lowercase and checking for duplicates. All entries consisting of more than one word are discarded, as required for bag-of-words sentiment analysis.

Usage

sento_lexicons(lexiconsIn, valenceIn = NULL, do.split = FALSE)

Arguments

lexiconsIn

a named list of (raw) lexicons, each element as a data.table or a data.frame with respectively a character column (the words) and a numeric column (the polarity scores). This argument can be one of the built-in lexicons accessible via list_lexicons.

valenceIn

a single valence word list as a data.table or a data.frame with respectively a "x" and a "y" or "t" column. The first column has the words, "y" has the values for bigram shifting, and "t" has the types of the valence shifter for a clustered approach to sentiment calculation (supported types: 1 = negators, 2 = amplifiers, 3 = deamplifiers). If three columns are provided, the first two will be considered only. This argument can be one of the built-in valence word lists accessible via list_valence_shifters. A word that appears in both a lexicon and the valence word list is prioritized as a lexical entry during sentiment calculation. If NULL, valence shifting is not applied in the sentiment analysis.

do.split

a logical that if TRUE splits every lexicon into a separate positive polarity and negative polarity lexicon.

Value

A list of class sentolexicons with each lexicon as a separate element according to its name, as a data.table, and optionally an element named valence that comprises the valence words. Every "x" column contains the words, every "y" column contains the polarity scores. The "t" column for valence shifters contains the different types.

Examples

Run this code

# NOT RUN {
data("list_lexicons", package = "sentometrics")
data("list_valence_shifters", package = "sentometrics")

# lexicons straight from built-in word lists
l1 <- sento_lexicons(list_lexicons[c("LM_en", "HENRY_en")])

# including a self-made lexicon, with and without valence shifters
lexIn <- c(list(myLexicon = data.table(w = c("nice", "boring"), s = c(2, -1))),
           list_lexicons[c("GI_en")])
valIn <- list_valence_shifters[["en"]]
l2 <- sento_lexicons(lexIn)
l3 <- sento_lexicons(lexIn, valIn)
l4 <- sento_lexicons(lexIn, valIn[, c("x", "y")], do.split = TRUE)
l5 <- sento_lexicons(lexIn, valIn[, c("x", "t")], do.split = TRUE)
l6 <- l5[c("GI_en_POS", "valence")] # preserves sentolexicons class

# }
# NOT RUN {
# include lexicons from lexicon package
lexIn2 <- list(hul = lexicon::hash_sentiment_huliu, joc = lexicon::hash_sentiment_jockers)
l7 <- sento_lexicons(c(lexIn, lexIn2), valIn)
# }
# NOT RUN {
# }
# NOT RUN {
# faulty extraction, no replacement allowed
l5["valence"]
l2[0]
l3[22]
l4[1] <- l2[1]
l4[[1]] <- l2[[1]]
l4$GI_en_NEG <- l2$myLexicon
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab