Last chance! 50% off unlimited learning
Sale ends in
convert
provides easy
conversion from a dfm to the document-term representations used in all other
text analysis packages for which conversions are defined. See also
convert-wrappers for convenience functions for specific package converters.convert(x, to = c("lda", "tm", "stm", "austin", "topicmodels", "lsa",
"matrix", "data.frame"), docvars = NULL, ...)
"lda"
"tm"
"stm"
"austin"
wfm
format from the
austin package"topicmodels"
"lsa"
meta
information in conversion to the STM package format. This aids
in selecting the document variables only corresponding to the documents
with non-zero counts.to
(see above).
See conversion target package documentation for more detailed descriptions
of the return formats.mycorpus <- corpus_subset(data_corpus_inaugural, Year > 1970)
quantdfm <- dfm(mycorpus, verbose = FALSE)
# austin's wfm format
identical(dim(quantdfm), dim(convert(quantdfm, to = "austin")))
# stm package format
stmdfm <- convert(quantdfm, to = "stm")
str(stmdfm)
# illustrate what happens with zero-length documents
quantdfm2 <- dfm(c(punctOnly = "!!!", mycorpus[-1]), verbose = FALSE)
rowSums(quantdfm2)
stmdfm2 <- convert(quantdfm2, to = "stm", docvars = docvars(mycorpus))
str(stmdfm2)
## Not run: ------------------------------------
# #' # tm's DocumentTermMatrix format
# tmdfm <- convert(quantdfm, to = "tm")
# str(tmdfm)
#
# # topicmodels package format
# str(convert(quantdfm, to = "topicmodels"))
#
# # lda package format
# ldadfm <- convert(quantdfm, to = "lda")
# str(ldadfm)
## ---------------------------------------------
Run the code above in your browser using DataLab