polarity: Polarity Score (Sentiment Analysis)

Description

Approximate the sentiment (polarity) of text by grouping variable(s).

Usage

polarity(text.var, grouping.var = NULL,
    positive.list = positive.words,
    negative.list = negative.words,
    negation.list = negation.words,
    amplification.list = increase.amplification.words,
    rm.incomplete = FALSE, digits = 3, ...)

Arguments

text.var

The text variable.

grouping.var

The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.

positive.list

A character vector of terms indicating positive reaction.

negative.list

A character vector of terms indicating negative reaction.

negation.list

A character vector of terms reversing the intent of a positive or negative word.

amplification.list

A character vector of terms that increases the intensity of a positive or negative word.

rm.incomplete

logical. If TRUE text rows ending with qdap's incomplete sentence end mark (|) will be removed from the analysis.

digits

Integer; number of decimal places to round when printing.

...

Other arguments supplied to end_inc.

Value

Returns a list of:
all
A dataframe of scores per row with:
- group.var - the grouping variable
- text.var - the text variable
- wc - word count
- polarity - sentence polarity score
- raw - raw polarity score (considering only positive and negative words)
- negation.adj.raw - raw adjusted for negation words
- amplification.adj.raw - raw adjusted for amplification words
- pos.words - words considered positive
- neg.words - words considered negative
groupA dataframe with the average polarity score by grouping variable.
digitsinteger value od number of digits to display; mostly internal use

Details

The equation used by the algorithm to assign value to polarity to each sentence fist utilizes the sentiment dictionary (Hu and Liu, 2004) to tag each word as either positive ($x_i^{+}$), negative ($x_i^{-}$), neutral ($x_i^{0}$), negator($x_i\neg$), or amplifier ($x_i^{\uparrow}$). Neutral words hold no value in the equation but do affect word count ($n$). Each positive ($x_i^{+}$) and negative ($x_i^{-}$) word is then weighted by the amplifiers ($x_i^{\uparrow}$) directly proceeding the positive or negative word. Next, I consider amplification value, adding the assigned value $1/n-1$ to increase the polarity relative to sentence length while ensuring that the polarity scores will remain between the values -1 and 1. This weighted value for each polarized word is then multiplied by -1 to the power of the number of negated ($x_i\neg$) words directly proceeding the positive or negative word. Last, these values are then summed and divided by the word count ($n$) yielding a polarity score ($\delta$) between -1 and 1. $$\delta=\frac{\sum(x_i^{0},\quad x_i^{\uparrow} + x_i^{+}\cdot(-1)^{\sum(x_i\neg)},\quad x_i^{\uparrow} + x_i^{-}\cdot(-1)^{\sum(x_i\neg)})}{n}$$ Where: $$x_i^{\uparrow}=\frac{1}{n-1}$$

References

Hu, M., & Liu, B. (2004). Mining opinion features in customer reviews. National Conference on Artificial Intelligence. http://www.slideshare.net/jeffreybreen/r-by-example-mining-twitter-for

Examples

Run this code

(poldat <- with(DATA, polarity(state, person)))
with(DATA, polarity(state, list(sex, adult)))
names(poldat)
truncdf(poldat$all, 8)
poldat$group
poldat2 <- with(mraja1spl, polarity(dialogue,
    list(sex, fam.aff, died)))
colsplit2df(poldat2$group)
plot(poldat)

poldat3 <- with(rajSPLIT, polarity(dialogue, person))
poldat3[["group"]][, "OL"] <- outlier.labeler(poldat3[["group"]][,
    "ave.polarity"])
poldat3[["all"]][, "OL"] <- outlier.labeler(poldat3[["all"]][,
    "polarity"])
head(poldat3[["group"]], 10)
htruncdf(poldat3[["all"]], 15, 8)
plot(poldat3)
plot(poldat3, nrow=4)

Run the code above in your browser using DataLab