Apply varieties of term frequency weightings to a dfm.
tf(x, scheme = c("count", "prop", "propmax", "boolean", "log", "augmented",
"logave"), base = 10, K = 0.5)
object for which idf or tf-idf will be computed (a document-feature matrix)
divisor for the normalization of feature frequencies by document. Valid types include:
count
default, each feature count will remain as feature counts, equivalent to dividing by 1
prop
feature proportions within document, equivalent to dividing each term by the total count of features in the document.
propmax
feature proportions relative to the most frequent term of the document, equivalent to dividing term counts by the frequency of the most frequent term in the document.
boolean
recode all non-zero counts as 1
log
take the logarithm of 1 + each
count, for base base
augmented
equivalent to K + (1 - K) * tf(x, "propmax")
logave
(1 + the log of the counts) / (1 + log of the counts / the average count within document)
base for the logarithm when scheme
is "log"
or
logave
the K for the augmentation when scheme = "augmented"
A document feature matrix to which the weighting scheme has been applied.
tf(x, scheme = "prop")
is equivalent to dfm_weight(x, "relFreq")
).
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.