Returns the Prototype LDA of a set of LDAs. This set is given as
LDABatch
object, LDARep
object, or as list of LDAs.
If the matrix of S-CLOP scores sclop
is passed, no calculation is needed/done.
getPrototype(...)# S3 method for LDARep
getPrototype(
x,
vocab,
limit.rel,
limit.abs,
atLeast,
progress = TRUE,
pm.backend,
ncpus,
keepTopics = FALSE,
keepSims = FALSE,
keepLDAs = FALSE,
sclop,
...
)
# S3 method for LDABatch
getPrototype(
x,
vocab,
limit.rel,
limit.abs,
atLeast,
progress = TRUE,
pm.backend,
ncpus,
keepTopics = FALSE,
keepSims = FALSE,
keepLDAs = FALSE,
sclop,
...
)
# S3 method for default
getPrototype(
lda,
vocab,
id,
job,
limit.rel,
limit.abs,
atLeast,
progress = TRUE,
pm.backend,
ncpus,
keepTopics = FALSE,
keepSims = FALSE,
keepLDAs = FALSE,
sclop,
...
)
additional arguments
[character
]
Vocabularies taken into consideration for merging topic matrices.
Not considered, if sclop
is passed. Default is the vocabulary of the first LDA.
[0,1]
See jaccardTopics
. Default is 1/500
.
Not considered for calculation, if sclop
is passed. But should be
passed determining the correct value for the resulting object.
[integer(1)
]
See jaccardTopics
. Default is 10
.
Not considered for calculation, if sclop
is passed. But should be
passed determining the correct value for the resulting object.
[integer(1)
]
See jaccardTopics
. Default is 0
.
Not considered for calculation, if sclop
is passed. But should be
passed determining the correct value for the resulting object.
[logical(1)
]
Should a nice progress bar be shown for the steps of mergeTopics
and jaccardTopics
? Turning it off, could lead to significantly
faster calculation. Default ist TRUE
.
Not considered, if sclop
is passed.
[character(1)
]
One of "multicore", "socket" or "mpi".
If pm.backend
is set, parallelStart
is
called before computation is started and parallelStop
is called after.
Not considered, if sclop
is passed.
[integer(1)
]
Number of (physical) CPUs to use. If pm.backend
is passed,
default is determined by availableCores
.
Not considered, if sclop
is passed.
[logical(1)
]
Should the merged topic matrix from mergeTopics
be kept?
Not considered, if sclop
is passed.
[logical(1)
]
Should the calculated topic similarities matrix from jaccardTopics
be kept? Not considered, if sclop
is passed.
[logical(1)
]
Should the considered LDAs be kept?
[symmetrical named matrix
]
(optional) All pairwise S-CLOP scores of the given LDA runs determined by
SCLOP.pairwise
. Matching of names is not implemented yet, so order matters.
[named list
]
List of LDA
objects, named by the corresponding "job.id".
[data.frame
or named vector
]
A data.frame or data.table with named columns (at least)
"job.id" (integerish
), "K", "alpha", "eta" and "num.iterations"
or a named vector with entries (at least) "K", "alpha", "eta" and "num.iterations".
If not passed, it is interpreted from param
of each LDA.
Not considered for LDABatch
or LDARep
objects.
[named list
] with entries
id
[character(1)
] See above.
protoid
[character(1)
] Name (ID) of the determined Prototype LDA.
lda
List of LDA
objects of the determined Prototype LDA
and - if keepLDAs
is TRUE
- all considered LDAs.
jobs
[data.table
] with parameter specifications for the LDAs.
param
[named list
] with parameter specifications for
limit.rel
[0,1], limit.abs
[integer(1)
] and
atLeast
[integer(1)
]. See above for explanation.
topics
[named matrix
] with the count of vocabularies
(row wise) in topics (column wise).
sims
[lower triangular named matrix
] with all pairwise
jaccard similarities of the given topics.
wordslimit
[integer
] with counts of words determined as
relevant based on limit.rel
and limit.abs
.
wordsconsidered
[integer
] with counts of considered
words for similarity calculation. Could differ from wordslimit
, if
atLeast
is greater than zero.
sclop
[symmetrical named matrix
] with all pairwise
S-CLOP scores of the given LDA runs.
While LDAPrototype
marks the overall shortcut for performing
multiple LDA runs and choosing the Prototype of them, getPrototype
just hooks up at determining the Prototype. The generation of multiple LDAs
has to be done before use of this function. The function is flexible enough
to use it at at least two steps/parts of the analysis: After generating the
LDAs (no matter whether as LDABatch or LDARep object) or after determing
the pairwise SCLOP values.
To save memory a lot of interim calculations are discarded by default.
If you use parallel computation, no progress bar is shown.
For details see the details sections of the workflow functions.
Other shortcut functions:
LDAPrototype()
Other PrototypeLDA functions:
LDAPrototype()
,
getSCLOP()
Other workflow functions:
LDARep()
,
SCLOP()
,
dendTopics()
,
jaccardTopics()
,
mergeTopics()
# NOT RUN {
res = LDARep(docs = reuters_docs, vocab = reuters_vocab,
n = 4, K = 10, num.iterations = 30)
topics = mergeTopics(res, vocab = reuters_vocab)
jacc = jaccardTopics(topics, atLeast = 2)
dend = dendTopics(jacc)
sclop = SCLOP.pairwise(jacc)
getPrototype(lda = getLDA(res), sclop = sclop)
proto = getPrototype(res, vocab = reuters_vocab, keepSims = TRUE,
limit.abs = 20, atLeast = 10)
proto
getPrototype(proto) # = getLDA(proto)
getConsideredWords(proto)
# > 10 if there is more than one word which is the 10-th often word (ties)
getRelevantWords(proto)
getSCLOP(proto)
# }
Run the code above in your browser using DataLab