Learn R Programming

textir (version 2.0-5)

tfidf: tf-idf

Description

term frequency, inverse document frequency

Usage

tfidf(x,normalize=TRUE)

Arguments

x

A dgCMatrix or matrix of counts.

normalize

Whether to normalize term frequency by document totals.

Value

A matrix of the same type as x, with values replaced by the tf-idf $$ f_{ij} * \log[n/(d_j+1)], $$ where \(f_{ij}\) is \(x_{ij}/m_i\) or \(x_{ij}\), depending on normalize, and \(d_j\) is the number of documents containing token \(j\).

See Also

pls, we8there

Examples

Run this code
# NOT RUN {
data(we8there)
## 20 high-variance tf-idf terms
colnames(we8thereCounts)[
	order(-sdev(tfidf(we8thereCounts)))[1:20]]
 
 
# }

Run the code above in your browser using DataLab