Three lexicons for sentiment analysis are combined here in a tidy data frame. The lexicons are the NRC Emotion Lexicon from Saif Mohammad and Peter Turney, the sentiment lexicon from Bing Liu and collaborators, of Finn Arup Nielsen, and of Tim Loughran and Bill Loughran. Words with non-ASCII characters were removed from the lexicons.
sentiments
A data frame with 27,314 rows and 4 variables:
An English word
A sentiment whose possible values depend on the lexicon. The "afinn" lexicon has no sentiment category (all are NA), and each of the others can be "positive" or "negative". The NRC lexicon can also be "anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise", or "trust", and the Loughran lexicon can also be "litigious", "uncertainty", "constraining", and "superfluous".
The source of the sentiment for the word. One of either "nrc", "bing", or "AFINN".
A numerical score for the sentiment. This value is NA
for the Bing and NRC lexicons, and runs between -5 and 5 for the AFINN
lexicon.
Note that the loughran lexicon is best suited for financial text, (e.g. where words like "share" is not necessarily positive, and "liability" not necessarily negative).