Calculate an exclusivity metric for an STM model.
exclusivity(model, M = 10, frexw = 0.7)
the STM object
the number of top words to consider per topic
the frex weight
a numeric vector containing semantic coherence for each topic
In Roberts et al 2014 we proposed using the Mimno et al 2011 semanticCoherence
metric
for helping with topic model selection. We found that semantic coherence alone is relatively easy to
achieve by having only a couple of topics which all are dominated by the most common words. Thus we
also proposed an exclusivity measure.
Our exclusivity measure includes some information on word frequency as well. It is based on the FREX
labeling metric (calcfrex
) with the weight set to .7 in favor of exclusivity by default.
This function is currently marked with the keyword internal because it does not have much error checking.
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago
Bischof and Airoldi (2012) "Summarizing topical content with word frequency and exclusivity" In Proceedings of the International Conference on Machine Learning.
Roberts, M., Stewart, B., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S., Albertson, B., et al. (2014). "Structural topic models for open ended survey responses." American Journal of Political Science, 58(4), 1064-1082. http://goo.gl/0x0tHJ
# NOT RUN {
exclusivity(gadarianFit)
# }
Run the code above in your browser using DataLab