q_tdm: Quick TermDocumentMatrix

Description

Make a TermDocumentMatrix from a vector of text and and optional vector of documents. To stem a document as well use the q_tdm_stem version of q_tdm which uses SnowballC's wordStem.

Usage

q_tdm(text, docs = seq_along(text), to = "tm", keep.hyphen = FALSE,
  ngrams = NULL, ...)
q_tdm_stem(text, docs = seq_along(text), to = "tm", keep.hyphen = FALSE,
  ngrams = NULL, ...)

Arguments

text

A vector of strings.

docs

A vector of document names.

target conversion format, consisting of the name of the package into whose document-term matrix representation the dfm will be converted:

"lda": a list with components "documents" and "vocab" as needed by lda.collapsed.gibbs.sampler from the lda package
"tm": a DocumentTermMatrix from the tm package
"stm": the format for the stm package
"austin": the wfm format from the austin package
"topicmodels": the "dtm" format as used by the topicmodels package

keep.hyphen

logical. If TRUE hyphens are retained in the terms (e.g., "math-like" is kept as "math-like"), otherwise they become a split for terms (e.g., "math-like" is converted to "math" & "like").

ngrams

A vector of ngrams (multiple wrds with spaces). Using this option results in the ngrams that will be retained in the matrix.

…

Additional arguments passed to dfm

Examples

Run this code

# NOT RUN {
(x <- with(presidential_debates_2012, q_tdm(dialogue, paste(time, tot, sep = "_"))))
tm::weightTfIdf(x)

(x2 <- with(presidential_debates_2012, q_tdm_stem(dialogue, paste(time, tot, sep = "_"))))
remove_stopwords(x2, stem=TRUE)
# }

Run the code above in your browser using DataLab