weightTfIdf(m, normalize = TRUE)
TermDocumentMatrix
in term frequency format.WeightingFunction
with the
additional attributes Name
and Acronym
.Term frequency $\mathit{tf}_{i,j}$ counts the number of occurrences $n_{i,j}$ of a term $t_i$ in a document $d_j$. In the case of normalization, the term frequency $\mathit{tf}_{i,j}$ is divided by $\sum_k n_{k,j}$.
Inverse document frequency for a term $t_i$ is defined as $$\mathit{idf}_i = \log \frac{|D|}{|{d \mid t_i \in d}|}$$ where $|D|$ denotes the total number of documents and where $|{d \mid t_i \in d}|$ is the number of documents where the term $t_i$ appears.
Term frequency - inverse document frequency is now defined as $\mathit{tf}_{i,j} \cdot \mathit{idf}_i$.