powered by
Extract words and phrases from a corpus of documents.
getvocab( corpus, mincount = 5, minphrasecount = NULL, ngram = 1, lang = "en", stopwords = lang, ... )
The vocabulary used in the corpus of documents.
The corpus of documents (a vector of characters).
Minimum word count to be considered as frequent.
Minimum collocation of words count to be considered as frequent.
maximum size of n-grams.
The language of the documents (NULL if no stemming).
Stopwords, or the language of the documents. NULL if stop words should not be removed.
Other parameters.
plotzipf, stopwords, create_vocabulary
plotzipf
stopwords
create_vocabulary
if (FALSE) { text = loadtext ("http://mattmahoney.net/dc/text8.zip") vocab1 = getvocab (text) # With stemming nrow (vocab1) vocab2 = getvocab (text, lang = NULL) # Without stemming nrow (vocab2) }
Run the code above in your browser using DataLab