Learn R Programming

qdap (version 0.2.5)

polarity: Polarity Score (Sentiment Analysis)

Description

Approximate the sentiment (polarity) of text by grouping variable(s).

Usage

polarity(text.var, grouping.var = NULL,
    positive.list = positive.words,
    negative.list = negative.words,
    negation.list = negation.words,
    amplification.list = increase.amplification.words,
    rm.incomplete = FALSE, digits = 3, ...)

Arguments

text.var
The text variable.
grouping.var
The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
positive.list
A character vector of terms indicating positive reaction.
negative.list
A character vector of terms indicating negative reaction.
negation.list
A character vector of terms reversing the intent of a positive or negative word.
amplification.list
A character vector of terms that increases the intensity of a positive or negative word.
rm.incomplete
logical. If TRUE text rows ending with qdap's incomplete sentence end mark (|) will be removed from the analysis.
digits
Integer; number of decimal places to round when printing.
...
Other arguments supplied to end_inc.

Value

  • Returns a list of:
  • allA dataframe of scores per row with:
    • group.var - the grouping variable
    • text.var - the text variable
    • wc - word count
    • polarity - sentence polarity score
    • raw - raw polarity score (considering only positive and negative words)
    • negation.adj.raw - raw adjusted for negation words
    • amplification.adj.raw - raw adjusted for amplification words
    • pos.words - words considered positive
    • neg.words - words considered negative
  • groupA dataframe with the average polarity score by grouping variable.
  • digitsinteger value od number of digits to display; mostly internal use

Details

The equation used by the algorithm to assign value to polarity to each sentence fist utilizes the sentiment dictionary (Hu and Liu, 2004) to tag each word as either positive ($x_i^{+}$), negative ($x_i^{-}$), neutral ($x_i^{0}$), negator($x_i\neg$), or amplifier ($x_i^{\uparrow}$). Neutral words hold no value in the equation but do affect word count ($n$). Each positive ($x_i^{+}$) and negative ($x_i^{-}$) word is then weighted by the amplifiers ($x_i^{\uparrow}$) directly proceeding the positive or negative word. Next, I consider amplification value, adding the assigned value $1/n-1$ to increase the polarity relative to sentence length while ensuring that the polarity scores will remain between the values -1 and 1. This weighted value for each polarized word is then multiplied by -1 to the power of the number of negated ($x_i\neg$) words directly proceeding the positive or negative word. Last, these values are then summed and divided by the word count ($n$) yielding a polarity score ($\delta$) between -1 and 1. $$\delta=\frac{\sum(x_i^{0},\quad x_i^{\uparrow} + x_i^{+}\cdot(-1)^{\sum(x_i\neg)},\quad x_i^{\uparrow} + x_i^{-}\cdot(-1)^{\sum(x_i\neg)})}{n}$$ Where: $$x_i^{\uparrow}=\frac{1}{n-1}$$

References

Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. National Conference on Artificial Intelligence. http://www.slideshare.net/jeffreybreen/r-by-example-mining-twitter-for

See Also

https://github.com/trestletech/Sermon-Sentiment-Analysis

Examples

Run this code
(poldat <- with(DATA, polarity(state, person)))
with(DATA, polarity(state, list(sex, adult)))
names(poldat)
truncdf(poldat$all, 8)
poldat$group
poldat2 <- with(mraja1spl, polarity(dialogue,
    list(sex, fam.aff, died)))
colsplit2df(poldat2$group)
plot(poldat)

poldat3 <- with(rajSPLIT, polarity(dialogue, person))
poldat3[["group"]][, "OL"] <- outlier.labeler(poldat3[["group"]][,
    "ave.polarity"])
poldat3[["all"]][, "OL"] <- outlier.labeler(poldat3[["all"]][,
    "polarity"])
head(poldat3[["group"]], 10)
htruncdf(poldat3[["all"]], 15, 8)
plot(poldat3)
plot(poldat3, nrow=4)

Run the code above in your browser using DataLab