Performs multiple runs of LDA and computes the Prototype LDA of this set of LDAs.
LDAPrototype(
docs,
vocabLDA,
vocabMerge = vocabLDA,
n = 100,
seeds,
id = "LDARep",
pm.backend,
ncpus,
limit.rel,
limit.abs,
atLeast,
progress = TRUE,
keepTopics = FALSE,
keepSims = FALSE,
keepLDAs = FALSE,
...
)
[list
]
Documents as received from LDAprep
.
[character
]
Vocabularies passed to lda.collapsed.gibbs.sampler
.
For additional (and necessary) arguments passed, see ellipsis (three-dot argument).
[character
]
Vocabularies taken into consideration for merging topic matrices.
[integer(1)
]
Number of Replications.
[integer(n)
]
Random Seeds for each Replication.
[character(1)
]
Name for the computation.
[character(1)
]
One of "multicore", "socket" or "mpi".
If pm.backend
is set, parallelStart
is
called before computation is started and parallelStop
is called after.
[integer(1)
]
Number of (physical) CPUs to use. If pm.backend
is passed,
default is determined by availableCores
.
[0,1]
See jaccardTopics
. Default is 1/500
.
[integer(1)
]
See jaccardTopics
. Default is 10
.
[integer(1)
]
See jaccardTopics
. Default is 0
.
[logical(1)
]
Should a nice progress bar be shown for the steps of mergeTopics
and jaccardTopics
? Turning it off, could lead to significantly
faster calculation. Default ist TRUE
.
[logical(1)
]
Should the merged topic matrix from mergeTopics
be kept?
[logical(1)
]
Should the calculated topic similarities matrix from jaccardTopics
be kept?
[logical(1)
]
Should the considered LDAs be kept?
additional arguments passed to lda.collapsed.gibbs.sampler
.
Arguments will be coerced to a vector of length n
.
Default parameters are alpha = eta = 1/K
and num.iterations = 200
.
There is no default for K
.
[named list
] with entries
id
[character(1)
] See above.
protoid
[character(1)
] Name (ID) of the determined Prototype LDA.
lda
List of LDA
objects of the determined Prototype LDA
and - if keepLDAs
is TRUE
- all considered LDAs.
jobs
[data.table
] with parameter specifications for the LDAs.
param
[named list
] with parameter specifications for
limit.rel
[0,1], limit.abs
[integer(1)
] and
atLeast
[integer(1)
]. See above for explanation.
topics
[named matrix
] with the count of vocabularies
(row wise) in topics (column wise).
sims
[lower triangular named matrix
] with all pairwise
jaccard similarities of the given topics.
wordslimit
[integer
] with counts of words determined as
relevant based on limit.rel
and limit.abs
.
wordsconsidered
[integer
] with counts of considered
words for similarity calculation. Could differ from wordslimit
, if
atLeast
is greater than zero.
sclop
[symmetrical named matrix
] with all pairwise
S-CLOP scores of the given LDA runs.
While LDAPrototype
marks the overall shortcut for performing
multiple LDA runs and choosing the Prototype of them, getPrototype
just hooks up at determining the Prototype. The generation of multiple LDAs
has to be done before use of getPrototype
.
To save memory a lot of interim calculations are discarded by default.
If you use parallel computation, no progress bar is shown.
For details see the details sections of the workflow functions at getPrototype
.
Other shortcut functions:
getPrototype()
Other PrototypeLDA functions:
getPrototype()
,
getSCLOP()
Other replication functions:
LDARep()
,
as.LDARep()
,
getJob()
,
mergeRepTopics()
# NOT RUN {
res = LDAPrototype(docs = reuters_docs, vocabLDA = reuters_vocab,
n = 4, K = 10, num.iterations = 30)
res
getPrototype(res) # = getLDA(res)
getSCLOP(res)
res = LDAPrototype(docs = reuters_docs, vocabLDA = reuters_vocab,
n = 4, K = 10, num.iterations = 30, keepLDAs = TRUE)
res
getLDA(res, all = TRUE)
getPrototypeID(res)
getParam(res)
# }
Run the code above in your browser using DataLab