Learn R Programming

tosca (version 0.3-4)

Tools for Statistical Content Analysis

Description

A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: .

Copy Link

Version

Install

install.packages('tosca')

Monthly Downloads

431

Version

0.3-4

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Lars Koppers

Last Published

April 22nd, 2025

Functions in tosca (0.3-4)

mergeTextmeta

Merge Textmeta Objects
precision

Precision and Recall
readWhatsApp

Read WhatsApp files
plotHeat

Plotting Topics over Time relative to Corpus
plotTopic

Plotting Counts of Topics over Time (Relative to Corpus)
plotTopicWord

Plotting Counts of Topics-Words-Combination over Time (Relative to Words)
readTextmeta

Read Corpora as CSV
plotWordSub

Plotting Counts/Proportion of Words/Docs in LDA-generated Topic-Subcorpora over Time
plotWordpt

Plots Counts of Topics-Words-Combination over Time (Relative to Topics)
sampling

Sample Texts
readWiki

Read Pages from Wikipedia
showMeta

Export Readable Meta-Data of Articles.
readWikinews

Read files from Wikinews
removeXML

Removes XML/HTML Tags and Umlauts
tidy.textmeta

Transform textmeta to an object with tidy text data
topTexts

Get The IDs Of The Most Representive Texts
plotScot

Plots Counts of Documents or Words over Time (relative to Corpus)
topicsInText

Coloring the words of a text corresponding to topic allocation
textmeta

"textmeta"-Objects
topWords

Top Words per Topic
topicCoherence

Calculating Topic Coherence
showTexts

Exports Readable Text Lists
deleteAndRenameDuplicates

Deletes and Renames Articles with the same ID
as.meta

"meta" Component of "textmeta"-Objects
filterCount

Subcorpus With Count Filter
clusterTopics

Cluster Analysis
LDAgen

Function to fit LDA model
cleanTexts

Data Preprocessing
duplist

Creating List of Duplicates
LDAprep

Create Lda-ready Dataset
as.textmeta.corpus

Transform corpus to textmeta
as.corpus.textmeta

Transform textmeta to corpus
filterWord

Subcorpus With Word Filter
filterID

Subcorpus With ID Filter
filterDate

Subcorpus With Date Filter
plotArea

Plotting topics over time as stacked areas below plotted lines.
intruderWords

Function to validate the fit of the LDA model
intruderTopics

Function to validate the fit of the LDA model
plotFreq

Plotting Counts of specified Wordgroups over Time (relative to Corpus)
makeWordlist

Counts Words in Text Corpora
mergeLDA

Preparation of Different LDAs For Clustering