topic_coherence: A function to calculate topic coherence for a given topic using the formulation in "Optimizing Semantic Coherence in Topic Models" available here: <http://dirichlet.net/pdf/mimno11optimizing.pdf>

Description

A function to calculate topic coherence for a given topic using the formulation in "Optimizing Semantic Coherence in Topic Models" available here: <http://dirichlet.net/pdf/mimno11optimizing.pdf>

Usage

topic_coherence(top_words, document_term_matrix, vocabulary = NULL,
  numeric_top_words = FALSE, K = length(top_words))

Arguments

top_words

A string vector of top words associated with a topic. If numeric_top_words == TRUE then a numeric vector of word indicies.

document_term_matrix

A numeric matrix or data.frame with dimensions number of documents X vocabulary length, where each entry is the count of word j in document i.

vocabulary

A string vector containing all words in the vocabulary. The vocaublary vector must have the same number of entries as the number of columns in the document_term_matrix, and the word indicated by entries in the i'th column of document_term_matrix must correspond to the i'th entry in vocabulary. If numeric_top_words == TRUE then it is not necessary to supply.

numeric_top_words

Defaults to FALSE. If TRUE, then the function expects a vector of word indicies instead of a string vector of actual words.

The number of top words to use in calculating the topic coherence. Defaults to the lneght of top_words. Common values are usually in the range of 10-20.

Value

The coherence score for the given topic.