freq_terms: Find Frequent Terms

Description

Find the most frequently occurring terms in a text vector.

Usage

freq_terms(text.var, top = 20, at.least = 1, stopwords = NULL,
  extend = TRUE, ...)

Arguments

text.var

The text variable.

top

Top number of terms to show.

at.least

An integer indicating at least how many letters a word must be to be included in the output.

stopwords

A character vector of words to remove from the text. qdap has a number of data sets that can be used as stop words including: Top200Words, Top100Words, Top25Words. For the tm package's traditional Engli

extend

logical. If TRUE the top argument is extended to any word that has the same frequency as the top word.

...

Other arguments passed to all_words.

Value

Returns a dataframe with the top occurring words.

Examples

Run this code

freq_terms(DATA$state, 5)
freq_terms(DATA$state)
freq_terms(DATA$state, extend = FALSE)
freq_terms(DATA$state, at.least = 4)
(out <- freq_terms(pres_debates2012$dialogue, stopwords = Top200Words))
plot(out)

## All words by sentence (row)
x <- raj$dialogue
list_df2df(setNames(lapply(x, freq_terms, top=Inf), seq_along(x)), "row")
list_df2df(setNames(lapply(x, freq_terms, top=10, stopwords = Dolch),
    seq_along(x)), "Title")


## All words by person
FUN <- function(x, n=Inf) freq_terms(paste(x, collapse=" "), top=n)
list_df2df(lapply(split(x, raj$person), FUN), "person")

## Plot it
out <- lapply(split(x, raj$person), FUN, n=10)
pdf("Freq Terms by Person.pdf", width=13)
lapply(seq_along(out), function(i) {
    ## dev.new()
    plot(out[[i]], plot=FALSE) + ggtitle(names(out)[i])
})
dev.off()

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples