Learn R Programming

tm (version 0.3-1)

readTabular: Read In a Text Document

Description

Returns a function which reads in a text document from a tabular data structure (like a data frame or a list matrix) with knowledge about its internal structure and possible available metadata as specified by so-called mappings.

Usage

readTabular(mappings, ...)

Arguments

mappings
a named list of characters. The constructed reader will map each character entry to a slot or meta datum corresponding to the named list entry. Valid names include .Data to access the document's content,
...
arguments for the generator function.

Value

  • A function with the signature elem, language, load, id:
  • elemA list with the two named elements content and uri. The first element must hold the document to be read in, the second element must hold a call to extract this document. The call is evaluated upon a request for load on demand.
  • loadA logical value indicating whether the document corpus should be immediately loaded into memory.
  • languageA character vector giving the text's language.
  • idA character vector representing a unique identification string for the returned text document.
  • The function returns a PlainTextDocument representing content.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the mappings) via lexical scoping.

See Also

Vignette 'Extensions: How to Handle Custom File Formats'.

Use getReaders to list available reader functions.

Examples

Run this code
df <- data.frame(contents = c("content 1", "content 2", "content 3"),
                 title    = c("title 1"  , "title 2"  , "title 3"  ),
                 authors  = c("author 1" , "author 2" , "author 3" ),
                 topics   = c("topic 1"  , "topic 2"  , "topic 3"  ),
                 stringsAsFactors = FALSE)
m <- list(.Data = "contents", Heading = "title",
          Author = "authors", Topic = "topics")
myReader <- readTabular(mappings = m)
ds <- DataframeSource(df)
elem <- getElem(stepNext(ds))
(result <- myReader(elem, load = TRUE, language = "en", id = "id1"))
meta(result)

Run the code above in your browser using DataLab