the weighting mode for the term-document
matrix. Possible settings are
tfTerm frequency
tf-idfTerm frequency inverse document frequency
binBinary frequency
logical
stemming
if set, stems words before making the term-document matrix.
minWordLength
words smaller than this number are discarded for
the term-document matrix.
minDocFreq
words that appear less often in documents than this
number are discarded for the term-document matrix.
stopwords
either a plain text file with all stopwords or a
Boolean value. In the latter case the default stopwords in
accordance with the documents' language are used.
dictionary
a character vector holding terms to be used as the
columns for the term-document matrix. No other terms from
object will be counted.
Value
An S4 object of class TermDocMatrix containing a sparse term-document
matrix. The following slots contain useful information:
DataThe sparse Matrix
WeightingThe weighting mode applied to the term-document matrix