Learn R Programming

ngramrr (version 0.2.0)

dtmwrappers: Wrappers to DocumentTermMatrix and DocumentTermMatrix to use n-gram tokenizaion

Description

Wrappers to DocumentTermMatrix and DocumentTermMatrix to use n-gram tokenization provided by ngramrr.

Usage

dtm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)
tdm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)

Arguments

x
character vector, Source or Corpus to be converted
char
logical, using character n-gram. char = FALSE denotes word n-gram.
ngmin
integer, minimun order of n-gram
ngmax
integer, maximun order of n-gram
rmEOL
logical, remove ngrams wih EOL character
...
Additional options for DocumentTermMatrix or DocumentTermMatrix

Value

DocumentTermMatrix or DocumentTermMatrix

See Also

ngramrr, DocumentTermMatrix, TermDocumentMatrix

Examples

Run this code
nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now", "entertain us",
"i feel stupid", "and contagious", "here we are now", "entertain us",
"a mulatto", "an albino", "a mosquito", "my libido", "yeah", "hey yay")
dtm2(nirvana, ngmax = 3, removePunctuation = TRUE)

Run the code above in your browser using DataLab