Learn R Programming

koRpus (version 0.10-2)

read.corp.custom: Import custom corpus data

Description

Read data from a custom corpus into a valid object of class kRp.corp.freq-class.

Usage

read.corp.custom(corpus, ...)

# S4 method for kRp.taggedText read.corp.custom(corpus, quiet = TRUE, caseSens = TRUE, log.base = 10, ...)

# S4 method for character read.corp.custom(corpus, format = "file", quiet = TRUE, caseSens = TRUE, log.base = 10, tagger = "kRp.env", force.lang = NULL, ...)

# S4 method for list read.corp.custom(corpus, quiet = TRUE, caseSens = TRUE, log.base = 10, ...)

Arguments

corpus

Either the path to directory with txt files to read and analyze, or a vector object already holding the text corpus. Can also be an already tokenized and tagged text object which inherits class kRp.tagged (then the column "token" of the "TT.res" slot is used).

...

Additional options to be passed through to the tokenize function.

quiet

Logical. If FALSE, short status messages will be shown.

caseSens

Logical. If FALSE, all tokens will be matched in their lower case form.

log.base

A numeric value defining the base of the logarithm used for inverse document frequency (idf). See log for details.

format

Either "file" or "obj", depending on whether you want to scan files or analyze the given object.

tagger

A character string pointing to the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if txt.file is already of class kRp.tagged-class. Defaults to tagger="kRp.env" to get the settings by get.kRp.env. Set to "tokenize" to use tokenize.

force.lang

A character string defining the language to be assumed for the text(s), by force.

Value

An object of class kRp.corp.freq-class.

Details

The methods should enable you to perform a basic text corpus frequency analysis. That is, not just to import analysis results like LCC files, but to import the corpus material itself. The resulting object is of class kRp.corp.freq-class, so it can be used for frequency analysis by other functions and methods of this package.

See Also

kRp.corp.freq-class

Examples

Run this code
# NOT RUN {
ru.corp <- read.corp.custom("~/mydata/corpora/russian_corpus/")
# }

Run the code above in your browser using DataLab