tf: compute (weighted) term frequency from a dfm

Description

Apply varieties of term frequency weightings to a dfm.

tf(x, scheme = c("count", "prop", "propmax", "boolean", "log", "augmented",
  "logave"), base = 10, K = 0.5)

object for which idf or tf-idf will be computed (a document-feature matrix)

scheme

divisor for the normalization of feature frequencies by document. Valid types include:

count: default, each feature count will remain as feature counts, equivalent to dividing by 1
prop: feature proportions within document, equivalent to dividing each term by the total count of features in the document.
propmax: feature proportions relative to the most frequent term of the document, equivalent to dividing term counts by the frequency of the most frequent term in the document.
boolean: recode all non-zero counts as 1
log: take the logarithm of 1 + each count, for base base
augmented: equivalent to K + (1 - K) * tf(x, "propmax")
logave: (1 + the log of the counts) / (1 + log of the counts / the average count within document)

base

base for the logarithm when scheme is "log" or logave

the K for the augmentation when scheme = "augmented"

A document feature matrix to which the weighting scheme has been applied.

tf(x, scheme = "prop") is equivalent to dfm_weight(x, "relFreq")).

Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.