Learn R Programming

koRpus (version 0.04-40)

read.corp.custom: Import custom corpus data

Description

Read data from a custom corpus into a valid object of class kRp.corp.freq-class.

Usage

read.corp.custom(corpus, format = "file",
    fileEncoding = "UTF-8", quiet = TRUE, caseSens = TRUE,
    ...)

Arguments

corpus
Either the path to directory with txt files to read and analyze, or a vector object already holding the text corpus. Can also be an already tokenized and tagged text object which inherits class kRp.tagged (then the column "toke
format
Either "file" or "obj", depending on whether you want to scan files or analyze the given object.
fileEncoding
A character string naming the encoding of the corpus files.
quiet
Logical. If FALSE, short status messages will be shown.
caseSens
Logical. If FALSE, all tokens will be matched in their lower case form.
...
Additional options to be passed through to the tokenize function.

Value

Details

The function should enable you to perform a basic text corpus frequency analysis. That is, not just to import analysis results like LCC files, but to import the corpus material itself. The resulting object is of class kRp.corp.freq-class, so it can be used for frequency analysis by other functions of this package.

See Also

kRp.corp.freq-class

Examples

Run this code
ru.corp <- read.corp.custom("~/mydata/corpora/russian_corpus/")

Run the code above in your browser using DataLab