An R interface for the Java Machine Learning for Language Toolkit (mallet) <http://mallet.cs.umass.edu/> to estimate probabilistic topic models, such as Latent Dirichlet Allocation. We can use the R package to read textual data into mallet from R objects, run the Java implementation of mallet directly in R, and extract results as R objects. The Mallet toolkit has many functions, this wrapper focuses on the topic modeling sub-package written by David Mimno. The package uses the rJava package to connect to a JVM.
The model, Latent Dirichlet allocation (LDA): David M Blei, Andrew Ng, Michael Jordan. Latent Dirichlet Allocation. J. of Machine Learning Research, 2003.
The Java toolkit: Andrew Kachites McCallum. The Mallet Toolkit. 2002.
Details of the fast sparse Gibbs sampling algorithm: Limin Yao, David Mimno, Andrew McCallum. Streaming Inference for Latent Dirichlet Allocation. KDD, 2009.
Hyperparameter optimization: Hanna Wallach, David Mimno, Andrew McCallum. Rethinking LDA: Why Priors Matter. NIPS, 2010.