Last chance! 50% off unlimited learning
Sale ends in
Fit a Latent Dirichlet Allocation topic model using collapsed Gibbs sampling.
FitLdaModel(dtm, k, iterations = NULL, burnin = -1, alpha = 0.1,
beta = 0.05, optimize_alpha = FALSE, calc_likelihood = FALSE,
calc_coherence = TRUE, calc_r2 = FALSE, ...)
A document term matrix or term co-occurrence matrix of class dgCMatrix
Integer number of topics
Integer number of iterations for the Gibbs sampler to run. A future version may include automatic stopping criteria.
Integer number of burnin iterations. If burnin
is greater than -1,
the resulting "phi" and "theta" matrices are an average over all iterations
greater than burnin
.
Vector of length k
for asymmetric or a number for symmetric.
This is the prior for topics over documents
Vector of length ncol(dtm)
for asymmetric or a number for symmetric.
This is the prior for words over topics.
Logical. Do you want to optimize alpha every 10 Gibbs iterations?
Defaults to FALSE
.
Do you want to calculate the likelihood every 10 Gibbs iterations?
Useful for assessing convergence. Defaults to FALSE
.
Do you want to calculate probabilistic coherence of topics
after the model is trained? Defaults to TRUE
.
Do you want to calculate R-squared after the model is trained?
Defaults to FALSE
.
Other arguments to be passed to TmParallelApply
Returns an S3 object of class c("LDA", "TopicModel"). DESCRIBE MORE
EXPLAIN IMPLEMENTATION DETAILS
# NOT RUN {
# load some data
data(nih_sample_dtm)
# fit a model
set.seed(12345)
m <- FitLdaModel(dtm = nih_sample_dtm[1:20,], k = 5,
iterations = 200, burnin = 175)
str(m)
# predict on held-out documents using gibbs sampling "fold in"
p1 <- predict(m, nih_sample_dtm[21:100,], method = "gibbs",
iterations = 200, burnin = 175)
# predict on held-out documents using the dot product method
p2 <- predict(m, nih_sample_dtm[21:100,], method = "dot")
# compare the methods
barplot(rbind(p1[1,],p2[1,]), beside = TRUE, col = c("red", "blue"))
# }
Run the code above in your browser using DataLab