A primarily internal function for calculating words according to the lift metric.
We expect most users will use labelTopics
instead.
calclift(logbeta, wordcounts)
a K by V matrix containing the log probabilities of seeing word v conditional on topic k
a V length vector indicating the number of times each word appears in the corpus.
Lift is the calculated by dividing the topic-word distribution by the empirical word count probability distribution. In other words the Lift for word v in topic k can be calculated as:
$$Lift = \beta_{k,v}/(w_v/\sum_v w_v)$$
We include this after seeing it used effectively in Matt Taddy's work including his excellent maptpx package. Definitions are given in Taddy(2012).
Taddy, Matthew. 2012. "On Estimation and Selection for Topic Models." AISTATS JMLR W&CP 22
labelTopics