Learn R Programming

tm (version 0.2-3.7)

TermDocMatrix: Term-document matrix

Description

Constructs a term-document matrix.

Usage

## S3 method for class 'TextDocCol':
TermDocMatrix(object, weighting = "tf", stemming
= FALSE, minWordLength = 3, minDocFreq = 1, stopwords = NULL, dictionary
= NULL)

Arguments

object
a text document collection
weighting
the weighting mode for the term-document matrix. Possible settings are
  • tfTerm frequency
  • tf-idfTerm frequency inverse document frequency
  • binBinary frequency
  • logical
stemming
if set, stems words before making the term-document matrix.
minWordLength
words smaller than this number are discarded for the term-document matrix.
minDocFreq
words that appear less often in documents than this number are discarded for the term-document matrix.
stopwords
either a plain text file with all stopwords or a Boolean value. In the latter case the default stopwords in accordance with the documents' language are used.
dictionary
a character vector holding terms to be used as the columns for the term-document matrix. No other terms from object will be counted.

Value

  • An S4 object of class TermDocMatrix containing a sparse term-document matrix. The following slots contain useful information:
  • DataThe sparse Matrix
  • WeightingThe weighting mode applied to the term-document matrix

Examples

Run this code
data("crude")
(tdm <- TermDocMatrix(crude, weighting = "tf-idf", stopwords = TRUE))

Run the code above in your browser using DataLab