Learn R Programming

qdap (version 2.2.1)

chunker: Break Text Into Ordered Word Chunks

Description

Some visualizations and algorithms require text to be broken into chunks of ordered words. chunker breaks text, optionally by grouping variables, into equal chunks. The chunk size can be specified by giving number of words to be in each chunk or the number of chunks.

Usage

chunker(text.var, grouping.var = NULL, n.words, n.chunks, as.string = TRUE,
  rm.unequal = FALSE)

Arguments

text.var
The text variable
grouping.var
The grouping variables. Default NULL generates one word list for all text. Also takes a single grouping variable or a list of 1 or more grouping variables.
n.words
An integer specifying the number of words in each chunk (must specify n.chunks or n.words).
n.chunks
An integer specifying the number of chunks (must specify n.chunks or n.words).
as.string
logical. If TRUE the chunks are returned as a single string. If FALSE the chunks are returned as a vector of single words.
rm.unequal
logical. If TRUE final chunks that are unequal in length to the other chunks are removed.

Value

  • Returns a list of text chunks.

Examples

Run this code
with(DATA, chunker(state, n.chunks = 10))
with(DATA, chunker(state, n.words = 10))
with(DATA, chunker(state, n.chunks = 10, as.string=FALSE))
with(DATA, chunker(state, n.chunks = 10, rm.unequal=TRUE))
with(DATA, chunker(state, person, n.chunks = 10))
with(DATA, chunker(state, list(sex, adult), n.words = 10))
with(DATA, chunker(state, person, n.words = 10, rm.unequal=TRUE))

## Bigger data
with(hamlet, chunker(dialogue, person, n.chunks = 10))
with(hamlet, chunker(dialogue, person, n.words = 300))

## with polarity hedonmetrics
dat <- with(pres_debates2012[pres_debates2012$person %in% qcv(OBAMA, ROMNEY), ],
    chunker(dialogue, list(person, time), n.words = 300))

dat2 <- colsplit2df(list2df(dat, "dialogue", "person&time")[, 2:1])

dat3 <- split(dat2[, -2], dat2$time)
ltruncdf(dat3, 10, 50)

poldat <- lapply(dat3, function(x) with(x, polarity(dialogue, person, constrain = TRUE)))


m <- lapply(poldat, function(x) plot(cumulative(x)))
m <- Map(function(w, x, y, z) {
        w + ggtitle(x) + xlab(y) + ylab(z)
    },
        m,
        paste("Debate", 1:3),
        list(NULL, NULL, "Duration (300 Word Segment)"),
        list(NULL, "Cumulative Average Polarity", NULL)
)

library(gridExtra)
do.call(grid.arrange, m)

## By person
## By person
poldat2 <- Map(function(x, x2){

    scores <- with(counts(x), split(polarity, person))
    setNames(lapply(scores, function(y) {
        y <- list(cumulative_average_polarity = y)
        attributes(y)[["constrained"]] <- TRUE
        qdap:::plot.cumulative_polarity(y) + xlab(NULL) + ylab(x2)
    }), names(scores))

}, poldat, paste("Debate", 1:3))

poldat2 <- lapply(poldat2, function(x) {
    x[[2]] <- x[[2]] + ylab(NULL)
    x
})

poldat2[[1]] <- Map(function(x, y) {
        x + ggtitle(y)
    },
        poldat2[[1]], qcv(Obama, Romney)
)

library(gridExtra)
do.call(grid.arrange, unlist(poldat2, recursive=FALSE))

Run the code above in your browser using DataLab