Learn R Programming

text2vec (version 0.4.0)

LatentDirichletAllocation: Creates Latent Dirichlet Allocation model.

Description

Creates Latent Dirichlet Allocation model.

Usage

LatentDirichletAllocation

LDA

Format

R6Class object.

Fields

verbose

logical = TRUE whether to display training inforamtion

Usage

For usage details see Methods, Arguments and Examples sections.

lda = LatentDirichletAllocation$new(n_topics, vocabulary,
              doc_topic_prior = 1 / n_topics, topic_word_prior = 1 / n_topics)
lda$fit(x, n_iter, convergence_tol = -1, check_convergence_every_n = 0)
lda$fit_transform(x, n_iter, convergence_tol = -1, check_convergence_every_n = 0)
lda$get_word_vectors()

Methods

$new(n_topics, vocabulary, doc_topic_prior = 1 / n_topics, # alpha topic_word_prior = 1 / n_topics)

Constructor for LDA vectors model. For description of arguments see Arguments section.

$fit(x, n_iter, convergence_tol = -1, check_convergence_every_n = 0)

fit LDA model to input matrix x

$fit_transform(x, n_iter, convergence_tol = -1, check_convergence_every_n = 0)

fit LDA model to input matrix x and transforms input documents to topic space

$transform(x, n_iter = 100, convergence_tol = 0.005, check_convergence_every_n = 1)

transforms new documents to topic space

$get_word_vectors()

get word-topic distribution

$plot(...)

plot LDA model using https://cran.r-project.org/package=LDAvis package. ... will be passed to LDAvis::createJSON and LDAvis::serVis functions

Arguments

lda

A LDA object

x

An input document-term matrix.

n_topics

integer desired number of latent topics. Also knows as K

vocabulary

vocabulary in a form of character or text2vec_vocab

doc_topic_prior

numeric prior for document-topic multinomial distribution. Also knows as alpha

topic_word_prior

numeric prior for topic-word multinomial distribution. Also knows as eta

n_iter

integer number of Gibbs iterations

convergence_tol

numeric = -1 defines early stopping strategy. We stop fitting when one of two following conditions will be satisfied: (a) we have used all iterations, or (b) perplexity_previous_iter / perplexity_current_iter - 1 < convergence_tol. By default perform all iterations.

check_convergence_every_n

integer Defines frequency of perplexity calculation. In some cases perplexity calculation during LDA fitting can take noticable amount of time. It make sense to do not calculate it at each iteration.

Examples

Run this code
# NOT RUN {
library(text2vec)
data("movie_review")
N = 500
tokens = movie_review$review[1:N] %>% tolower %>% word_tokenizer
it = itoken(tokens, ids = movie_review$id[1:N])
v = create_vocabulary(it) %>%
  prune_vocabulary(term_count_min = 5, doc_proportion_max = 0.2)
dtm = create_dtm(it, vocab_vectorizer(v), 'lda_c')
lda_model = LatentDirichletAllocation$new(n_topics = 10, vocabulary = v,
 doc_topic_prior = 0.1,
 topic_word_prior = 0.1)
 doc_topic_distr = lda_model$fit_transform(dtm, n_iter =20, check_convergence_every_n = 5)
 # run LDAvis visualisation if needed (make sure LDAvis package installed)
 # lda_model$plot()
# }

Run the code above in your browser using DataLab