Learn R Programming

lda (version 1.4.2)

slda.predict: Predict the response variable of documents using an sLDA model.

Description

These functions take a fitted sLDA model and predict the value of the response variable (or document-topic sums) for each given document.

Usage

slda.predict(documents, topics, model, alpha, eta, num.iterations = 100, average.iterations = 50, trace = 0L)
slda.predict.docsums(documents, topics, alpha, eta, num.iterations = 100, average.iterations = 50, trace = 0L)

Arguments

documents
A list of document matrices comprising a corpus, in the format described in lda.collapsed.gibbs.sampler.
topics
A $K \times V$ matrix where each entry is an integer that is the number of times the word (column) has been allocated to the topic (row) (a normalised version of this is sometimes denoted $\beta_{w,k}$ in the literature, see details). The column names should correspond to the words in the vocabulary. The topics field from the output of slda.em can be used.
model
A fitted model relating a document's topic distribution to the response variable. The model field from the output of slda.em can be used.
alpha
The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details.
eta
The scalar value of the Dirichlet hyperparamater for topic multinomials.
num.iterations
Number of iterations of inference to perform on the documents.
average.iterations
Number of samples to average over to produce the predictions.
trace
When trace is greater than zero, diagnostic messages will be output. Larger values of trace imply more messages.

Value

For slda.predict, a numeric vector of the same length as documents giving the predictions. For slda.predict.docsums, a $K \times N$ matrix of document assignment counts.

Details

Inference is first performed on the documents by using Gibbs sampling and holding the word-topic matrix $\beta_{w,k}$ constant. Typically for a well-fit model only a small number of iterations are required to obtain good fits for new documents. These topic vectors are then piped through model to yield numeric predictions associated with each document.

References

Blei, David M. and McAuliffe, John. Supervised topic models. Advances in Neural Information Processing Systems, 2008.

See Also

See lda.collapsed.gibbs.sampler for a description of the format of the input data, as well as more details on the model. See predictive.distribution if you want to make predictions about the contents of the documents instead of the response variables.

Examples

Run this code
## The sLDA demo shows an example usage of this function.
## Not run: demo(slda)

Run the code above in your browser using DataLab