Learn R Programming

sentometrics (version 0.2)

compute_sentiment: Compute document-level sentiment across features and lexicons

Description

Given a corpus of texts, computes sentiment per document using the bag-of-words approach based on the lexicons provided and a choice of aggregation across words per document. Relies partly on the quanteda package. The scores computed are net sentiment (sum of positive minus sum of negative scores).

Usage

compute_sentiment(sentocorpus, lexicons, how = get_hows()$words, dfm = NULL)

Arguments

sentocorpus

a sentocorpus object created with sento_corpus.

lexicons

output from a setup_lexicons call.

how

a single character vector defining how aggregation within documents should be performed. For currently available options on how aggregation can occur, see get_hows()$words.

dfm

optional; an output from a quanteda dfm call, such that users can specify their own tokenization scheme (via tokens) as well as other parameters related to the construction of a document-feature matrix (dfm). By default, a dfm is created based on a tokenization that removes punctuation, numbers, symbols and separators. We suggest to stick to unigrams, as the remainder of the sentiment computation and built-in lexicons assume the same.

Value

A list containing:

corpus

the supplied sentocorpus object; the texts are altered if valence shifters are part of the lexicons.

sentiment

the sentiment scores data.table with a "date" and all lexicon--feature sentiment scores columns.

features

a character vector of the different features.

lexicons

a character vector of the different lexicons used.

howWithin

a character vector to remind how sentiment within documents was aggregated.

Details

For a separate calculation of positive (resp. negative) sentiment, one has to provide distinct positive (resp. negative) lexicons. This can be done using the do.split option in the setup_lexicons function, which splits out the lexicons into a positive and a negative polarity counterpart. NAs are converted to 0, under the assumption that this is equivalent to no sentiment.

See Also

dfm, tokens

Examples

Run this code
# NOT RUN {
data("usnews")
data("lexicons")
data("valence")

# sentiment computation based on raw frequency counts
corpus <- sento_corpus(corpusdf = usnews)
corpusSample <- quanteda::corpus_sample(corpus, size = 1000)
l <- setup_lexicons(lexicons[c("LM_eng", "HENRY_eng")], valence[["valence_eng"]])
sent <- compute_sentiment(corpusSample, l, how = "counts")

# }
# NOT RUN {
# same sentiment computation based on a user-supplied dfm with default settings
dfm <- quanteda::dfm(quanteda::tokens(corpus), verbose = FALSE)
sent <- compute_sentiment(corpusSample, l, how = "counts", dfm = dfm)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab