weight: weight the feature frequencies in a dfm

Description

Returns a document by feature matrix with the feature frequencies weighted according to one of several common methods.

Usage

weight(x, type, ...)
"weight"(x, type = c("frequency", "relFreq", "relMaxFreq", "logFreq", "tfidf"), ...)
"weight"(x, type, ...)
smoother(x, smoothing = 1)

Arguments

document-feature matrix created by dfm

type

a label of the weight type, or a named numeric vector of values to apply to the dfm. One of:

...

not currently used. For finer grained control, consider calling tf or tfidf directly.

smoothing

constant added to the dfm cells for smoothing, default is 1

Value

The dfm with weighted values.

Details

This converts a matrix from sparse to dense format, so may exceed memory requirements depending on the size of your input matrix.

References

Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schutze. Introduction to Information Retrieval. Vol. 1. Cambridge: Cambridge University Press, 2008.

Examples

Run this code

dtm <- dfm(inaugCorpus)
x <- apply(dtm, 1, function(tf) tf/max(tf))
topfeatures(dtm)
normDtm <- weight(dtm, "relFreq")
topfeatures(normDtm)
maxTfDtm <- weight(dtm, type="relMaxFreq")
topfeatures(maxTfDtm)
logTfDtm <- weight(dtm, type="logFreq")
topfeatures(logTfDtm)
tfidfDtm <- weight(dtm, type="tfidf")
topfeatures(tfidfDtm)

# combine these methods for more complex weightings, e.g. as in Section 6.4
# of Introduction to Information Retrieval
head(logTfDtm <- weight(dtm, type="logFreq"))
head(tfidf(logTfDtm, normalize = FALSE))

# apply numeric weights
str <- c("apple is better than banana", "banana banana apple much better")
weights <- c(apple = 5, banana = 3, much = 0.5)
(mydfm <- dfm(str, ignoredFeatures = stopwords("english"), verbose = FALSE))
weight(mydfm, weights)

Run the code above in your browser using DataLab