- documents
The documents to be modeled. Object must be a list of with
each element corresponding to a document. Each document is represented as
an integer matrix with two rows, and columns equal to the number of unique
vocabulary words in the document. The first row contains the 1-indexed
vocabulary entry and the second row contains the number of times that term
appears.
This is similar to the format in the lda package except that
(following R convention) the vocabulary is indexed from one. Corpora can be
imported using the reader function and manipulated using the
prepDocuments
.
- vocab
Character vector specifying the words in the corpus in the
order of the vocab indices in documents. Each term in the vocabulary index
must appear at least once in the documents. See
prepDocuments
for dropping unused items in the vocabulary.
- K
A vector of positive integers representing the desired number of
topics for separate runs of selectModel.
- prevalence
A formula object with no response variable or a matrix
containing topic prevalence covariates. Use s()
, ns()
or
bs()
to specify smooth terms. See details for more information.
- content
A formula containing a single variable, a factor variable or
something which can be coerced to a factor indicating the category of the
content variable for each document.
- data
Dataset which contains prevalence and content covariates.
- max.em.its
The maximum number of EM iterations. If convergence has
not been met at this point, a message will be printed.
- verbose
A logical flag indicating whether information should be
printed to the screen.
- init.type
The method of initialization. See stm
.
- emtol
Convergence tolerance.
- seed
Seed for the random number generator. stm
saves the seed
it uses on every run so that any result can be exactly reproduced. When
attempting to reproduce a result with that seed, it should be specified
here.
- runs
Total number of STM runs used in the cast net stage.
Approximately 15 percent of these runs will be used for running a STM until
convergence.
- frexw
Weight used to calculate exclusivity
- net.max.em.its
Maximum EM iterations used when casting the net
- netverbose
Whether verbose should be used when calculating net
models.
- M
Number of words used to calculate semantic coherence and
exclusivity. Defaults to 10.
- ...
Additional options described in details of stm.