readTextmeta

readTextmeta.df

Reads CSV-files and seperates the text and meta data. The result is a
<code>textmeta</code> object.

manip

A framework for statistical analysis in content analysis. In addition to a pipeline for preprocessing text corpora and linking to the latent Dirichlet allocation from the 'lda' package, plots are offered for the descriptive analysis of text corpora and topic models. In addition, an implementation of Chang's intruder words and intruder topics is provided. Sample data for the vignette is included in the toscaData package, which is available on gitHub: <https://github.com/Docma-TU/toscaData>.

Lars Koppers

tosca

Tools for Statistical Content Analysis

Jonas Rieger

Karin Boczek

Gerret von Nordheim

readTextmeta function

<dl><dt>path</dt>
<dd><code>character/data.frame</code> string with path where the data files
are OR parameter <code>df</code> for <code>readTextmeta.df</code></dd>
<dt>file</dt>
<dd><code>character</code> string with names of the CSV files</dd>
<dt>cols</dt>
<dd><code>character</code> vector with columns which should be kept</dd>
<dt>dateFormat</dt>
<dd><code>character</code> string with the date format in the files
for <code><a href="/link/as.Date?package=tosca&version=0.3-4" data-mini-rdoc="tosca::as.Date">as.Date</a></code></dd>
<dt>idCol</dt>
<dd><code>character</code> string with column name of the IDs</dd>
<dt>dateCol</dt>
<dd><code>character</code> string with column name of the Dates</dd>
<dt>titleCol</dt>
<dd><code>character</code> string with column name of the Titles</dd>
<dt>textCol</dt>
<dd><code>character</code> string with column name of the Texts</dd>
<dt>encoding</dt>
<dd>character string with encoding specification of the files</dd>
<dt>xmlAction</dt>
<dd><code>logical</code> whether all columns of the CSV should be
handled with <code>removeXML</code></dd>
<dt>duplicateAction</dt>
<dd><code>logical</code>
whether <code>deleteAndRenameDuplicates</code> should be applied to the
created <code>textmeta</code> object</dd>
<dt>df</dt>
<dd><code>data.frame</code> table which should be transformed to a textmeta object</dd></dl>

Arguments

Read Corpora as CSV — readTextmeta

<dl>

<dt>path</dt>
<dd><code>character/data.frame</code> string with path where the data files
are OR parameter <code>df</code> for <code>readTextmeta.df</code></dd>


<dt>file</dt>
<dd><code>character</code> string with names of the CSV files</dd>


<dt>cols</dt>
<dd><code>character</code> vector with columns which should be kept</dd>


<dt>dateFormat</dt>
<dd><code>character</code> string with the date format in the files
for <code><a href='https://rdrr.io/r/base/as.Date.html'>as.Date</a></code></dd>


<dt>idCol</dt>
<dd><code>character</code> string with column name of the IDs</dd>


<dt>dateCol</dt>
<dd><code>character</code> string with column name of the Dates</dd>


<dt>titleCol</dt>
<dd><code>character</code> string with column name of the Titles</dd>


<dt>textCol</dt>
<dd><code>character</code> string with column name of the Texts</dd>


<dt>encoding</dt>
<dd>character string with encoding specification of the files</dd>


<dt>xmlAction</dt>
<dd><code>logical</code> whether all columns of the CSV should be
handled with <code>removeXML</code></dd>


<dt>duplicateAction</dt>
<dd><code>logical</code>
whether <code>deleteAndRenameDuplicates</code> should be applied to the
created <code>textmeta</code> object</dd>


<dt>df</dt>
<dd><code>data.frame</code> table which should be transformed to a textmeta object</dd>

</dl>

readTextmeta: Read Corpora as CSV

Description

Usage

Value

Arguments