Learn R Programming

tm (version 0.3-1)

Corpus: Corpus

Description

Constructs a text document collection (corpus).

Usage

## S3 method for class 'Source':
Corpus(object, readerControl = list(reader = object@DefaultReader,
language = "en_US", load = TRUE), dbControl = list(useDb = FALSE, dbName = "",
dbType = "DB1"), ...)

Arguments

object
A Source object.
readerControl
A list with the named components reader representing a reading function capable of handling the file format found in object, language giving the text's language (preferably in Iso 639-1
dbControl
A list with the named components useDb indicating that database support should be activated, dbName giving the filename holding the sourced out objects (i.e., the database), and dbType holding a valid dat
...
Optional arguments for the reader.

Value

  • An S4 object of class Corpus which extends the class list containing a collection of text documents.

Examples

Run this code
txt <- system.file("texts", "txt", package = "tm")
(Corpus(DirSource(txt), readerControl = list(reader
= readPlain, language = "en_US", load = TRUE), dbControl = list(useDb =
TRUE, dbName = "oviddb", dbType = "DB1")))
reut21578 <- system.file("texts", "reut21578", package = "tm")
Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))

Run the code above in your browser using DataLab