q_dtm: Quick DocumentTermMatrix

Description

Make a DocumentTermMatrix from a vector of text and and optional vector of documents. To stem a document as well use the q_dtm_stem version of q_dtm which uses SnowballC's wordStem.

Usage

q_dtm(text, docs = seq_along(text), to = "tm", keep.hyphen = FALSE,
  ngrams = NULL, ...)
q_dtm_stem(text, docs = seq_along(text), to = "tm", keep.hyphen = FALSE,
  ngrams = NULL, ...)

Arguments

text

A vector of strings.

docs

A vector of document names.

target conversion format, consisting of the name of the package into whose document-term matrix representation the dfm will be converted:

"lda": a list with components "documents" and "vocab" as needed by lda.collapsed.gibbs.sampler from the lda package
"tm": a DocumentTermMatrix from the tm package
"stm": the format for the stm package
"austin": the wfm format from the austin package
"topicmodels": the "dtm" format as used by the topicmodels package

keep.hyphen

logical. If TRUE hyphens are retained in the terms (e.g., "math-like" is kept as "math-like"), otherwise they become a split for terms (e.g., "math-like" is converted to "math" & "like").

ngrams

A vector of ngrams (multiple wrds with spaces). Using this option results in the ngrams that will be retained in the matrix.

…

Additional arguments passed to dfm.

Value

Returns a DocumentTermMatrix.

Examples

Run this code

# NOT RUN {
(x <- with(presidential_debates_2012, q_dtm(dialogue, paste(time, tot, sep = "_"))))
tm::weightTfIdf(x)

(x2 <- with(presidential_debates_2012, q_dtm_stem(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(x2, stem=TRUE)

bigrams <- c('make sure', 'governor romney', 'mister president',
    'united states', 'middle class', 'middle east', 'health care',
    'american people', 'dodd frank', 'wall street', 'small business')

grep(" ", x$dimnames$Terms, value = TRUE) #no ngrams

(x3 <- with(presidential_debates_2012,
    q_dtm(dialogue, paste(time, tot, sep = "_"), ngrams = bigrams)
))

grep(" ", x3$dimnames$Terms, value = TRUE) #ngrams
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples