A function to calculate TF-IDF and other related statistics on a set of documents.
tfidf(document_term_matrix, vocabulary,
remove_documents_with_no_terms = FALSE,
only_calculate_corpus_level_statistics = TRUE, display_rankings = TRUE,
top_words_to_display = 40)
document_term_matrix A numeric matrix or data.frame with dimensions number of documents X vocabulary length, where each entry is the count of word j in document i.
A string vector containing all words in the vocabulary. The vocaublary vector must have the same number of entries as the number of columns in the document_term_matrix, and the word indicated by entries in the j'th column of document_term_matrix must correspond to the j'th entry in vocabulary.
Defualts to FALSE, if TRUE then all words in the vocabulary that appear zero times in the selected set of documents will be removed.
Defaults to TRUE. If FALSE then tfidf scores will be calculated for every token in every document.
If TRUE then the function will print out the top_words_to_display number of words ranked by TF-IDF.
The number of top ranked words to print out if display_rankings == TRUE.
A list object.