Learn R Programming

SpeedReader (version 0.9.1)

tfidf: A function to calculate TF-IDF and other related statistics on a set of documents.

Description

A function to calculate TF-IDF and other related statistics on a set of documents.

Usage

tfidf(document_term_matrix, vocabulary,
  remove_documents_with_no_terms = FALSE,
  only_calculate_corpus_level_statistics = TRUE, display_rankings = TRUE,
  top_words_to_display = 40)

Arguments

document_term_matrix

document_term_matrix A numeric matrix or data.frame with dimensions number of documents X vocabulary length, where each entry is the count of word j in document i.

vocabulary

A string vector containing all words in the vocabulary. The vocaublary vector must have the same number of entries as the number of columns in the document_term_matrix, and the word indicated by entries in the j'th column of document_term_matrix must correspond to the j'th entry in vocabulary.

remove_documents_with_no_terms

Defualts to FALSE, if TRUE then all words in the vocabulary that appear zero times in the selected set of documents will be removed.

only_calculate_corpus_level_statistics

Defaults to TRUE. If FALSE then tfidf scores will be calculated for every token in every document.

display_rankings

If TRUE then the function will print out the top_words_to_display number of words ranked by TF-IDF.

top_words_to_display

The number of top ranked words to print out if display_rankings == TRUE.

Value

A list object.