Learn R Programming

ldaPrototype (version 0.3.1)

LDABatch: LDA Replications on a Batch System

Description

Performs multiple runs of Latent Dirichlet Allocation on a batch system using the batchtools-package.

Usage

LDABatch(
  docs,
  vocab,
  n = 100,
  seeds,
  id = "LDABatch",
  load = FALSE,
  chunk.size = 1,
  resources,
  ...
)

Arguments

docs

[list] Documents as received from LDAprep.

vocab

[character] Vocabularies passed to lda.collapsed.gibbs.sampler. For additional (and necessary) arguments passed, see ellipsis (three-dot argument).

n

[integer(1)] Number of Replications.

seeds

[integer(n)] Random Seeds for each Replication.

id

[character(1)] Name for the registry's folder.

load

[logical(1)] If a folder with name id exists: should the existing registry be loaded?

chunk.size

[integer(1)] Requested chunk size for each single chunk. See chunk.

resources

[named list] Computational resources for the jobs to submit. See submitJobs.

...

additional arguments passed to lda.collapsed.gibbs.sampler. Arguments will be coerced to a vector of length n. Default parameters are alpha = eta = 1/K and num.iterations = 200. There is no default for K.

Value

[named list] with entries id for the registry's folder name, jobs for the submitted jobs' ids and its parameter settings and reg for the registry itself.

Details

The function generates multiple LDA runs with the possibility of using a batch system. The integration is done by the batchtools-package. After all jobs of the corresponding registry are terminated, the whole registry can be ported to your local computer for further analysis.

The function returns a LDABatch object. You can receive results and all other elements of this object with getter functions (see getJob).

See Also

Other batch functions: as.LDABatch(), getJob(), mergeBatchTopics()

Other LDA functions: LDARep(), LDA(), getTopics()

Examples

Run this code
# NOT RUN {
batch = LDABatch(docs = reuters_docs, vocab = reuters_vocab, n = 4, K = 15)
batch
getRegistry(batch)
getJob(batch)
getLDA(batch, 2)

batch2 = LDABatch(docs = reuters_docs, vocab = reuters_vocab, K = 15, chunk.size = 20)
batch2
head(getJob(batch2))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab