Determine a Prototype from a number of runs of Latent Dirichlet Allocation (LDA) measuring its similarities with S-CLOP: A procedure to select the LDA run with highest mean pairwise similarity, which is measured by S-CLOP (Similarity of multiple sets by Clustering with Local Pruning), to all other runs. LDA runs are specified by its assignments leading to estimators for distribution parameters. Repeated runs lead to different results, which we encounter by choosing the most representative LDA run as prototype. For bug reports and feature requests please use the issue tracker: https://github.com/JonasRieger/ldaPrototype/issues. Also have a look at the (detailed) example at https://github.com/JonasRieger/ldaPrototype.
reuters
Example Dataset (91 articles from Reuters) for testing.
LDA
LDA objects used in this package.
as.LDARep
LDARep objects.
as.LDABatch
LDABatch objects.
getTopics
Getter for LDA
objects.
getJob
Getter for LDARep
and LDABatch
objects.
getSimilarity
Getter for TopicSimilarity
objects.
getSCLOP
Getter for PrototypeLDA
objects.
getPrototype
Determine the Prototype LDA.
LDARep
Performing multiple LDAs locally (using parallelization).
LDABatch
Performing multiple LDAs on Batch Systems.
mergeTopics
Merge topic matrices from multiple LDAs.
jaccardTopics
Calculate topic similarities using the Jaccard coefficient (see Similarity Measures for other possible measures).
dendTopics
Create a dendrogram from topic similarities.
SCLOP
Determine various S-CLOP values.
pruneSCLOP
Prune TopicDendrogram
objects.
cosineTopics
Cosine Similarity.
jaccardTopics
Jaccard Coefficient.
jsTopics
Jensen-Shannon Divergence.
rboTopics
rank-biased overlap.
getPrototype
Shortcut which includes all calculation steps.
LDAPrototype
Shortcut which performs multiple LDAs and
determines their Prototype.
Rieger, Jonas (2020). "ldaPrototype: A method in R to get a Prototype of multiple Latent Dirichlet Allocations". Journal of Open Source Software, 5(51), 2181, 10.21105/joss.02181.
Rieger, Jonas, J<U+00F6>rg Rahnenf<U+00FC>hrer and Carsten Jentsch (2020). "Improving Latent Dirichlet Allocation: On Reliability of the Novel Method LDAPrototype". In: Natural Language Processing and Information Systems, NLDB 2020. LNCS 12089, pp. 118--125, 10.1007/978-3-030-51310-8_11.
Rieger, Jonas, Lars Koppers, Carsten Jentsch and J<U+00F6>rg Rahnenf<U+00FC>hrer (2020). "Improving Reliability of Latent Dirichlet Allocation by Assessing Its Stability using Clustering Techniques on Replicated Runs". arXiv 2003.04980, URL https://arxiv.org/abs/2003.04980.
Useful links: