Learn R Programming

tm (version 0.3-3)

readPDF: Read In a PDF Document

Description

Returns a function which reads in a portable document format (PDF) document extracting both its text and its meta data.

Usage

readPDF(...)

Arguments

...
Arguments for the generator function.

Value

  • A function with the signature elem, language, load, id:
  • elemA list with the two named elements content and uri. The first element must hold the document to be read in, the second element must hold a call to extract this document. The call is evaluated upon a request for load on demand.
  • languageA character vector giving the text's language.
  • loadA logical value indicating whether the document corpus should be immediately loaded into memory.
  • idA character vector representing a unique identification string for the returned text document.
  • The function returns a PlainTextDocument representing the text and meta data in content.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments via lexical scoping. This is especially useful for reader functions for complex data structures which need a lot of configuration options.

Note that this PDF reader needs both the tools pdftotext and pdfinfo installed and accessable on your system.

See Also

Use getReaders to list available reader functions.

Examples

Run this code
f <- system.file("texts", "pdf", "pdfarchiving.pdf", package = "tm")
readPDF()
pdf <- readPDF()(elem = list(uri = substitute(file(f))), language = "en_US", load = TRUE, id = "id1")
meta(pdf)

Run the code above in your browser using DataLab