slda.predict: Predict the response variable of documents using an sLDA model.

Description

These functions take a fitted sLDA model and predict the value of the response variable (or document-topic sums) for each given document.

Usage

slda.predict(documents, topics, model, alpha, eta,
num.iterations = 100, average.iterations = 50, trace = 0L)
slda.predict.docsums(documents, topics, alpha, eta,
num.iterations = 100, average.iterations = 50, trace = 0L)

Arguments

documents

A list of document matrices comprising a corpus, in the format described in lda.collapsed.gibbs.sampler.

topics

A $K \times V$ matrix where each entry is an integer that is the number of times the word (column) has been allocated to the topic (row) (a normalised version of this is sometimes denoted $\beta_{w,k}$ in the literature, see details). The column names should correspond to the words in the vocabulary. The topics field from the output of slda.em can be used.

model

A fitted model relating a document's topic distribution to the response variable. The model field from the output of slda.em can be used.

alpha

The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details.

eta

The scalar value of the Dirichlet hyperparamater for topic multinomials.

num.iterations

Number of iterations of inference to perform on the documents.

average.iterations

Number of samples to average over to produce the predictions.

trace

When trace is greater than zero, diagnostic messages will be output. Larger values of trace imply more messages.

Value

For slda.predict, a numeric vector of the same length as documents giving the predictions. For slda.predict.docsums, a $K \times N$ matrix of document assignment counts.

Details

Inference is first performed on the documents by using Gibbs sampling and holding the word-topic matrix $\beta_{w,k}$ constant. Typically for a well-fit model only a small number of iterations are required to obtain good fits for new documents. These topic vectors are then piped through model to yield numeric predictions associated with each document.

References

Blei, David M. and McAuliffe, John. Supervised topic models. Advances in Neural Information Processing Systems, 2008.

Examples

Run this code

## The sLDA demo shows an example usage of this function.
## Not run: demo(slda)

Run the code above in your browser using DataLab