Returns a function which reads in a portable document format (PDF)
document extracting both its text and its meta data.
Usage
readPDF(...)
Arguments
...
Arguments for the generator function.
Value
A function with the signature elem, language, load, id:
elemA list with the two named elements content
and uri. The first element must hold the document to
be read in, the second element must hold a call to extract this
document. The call is evaluated upon a request for load on demand.
languageA character vector giving the text's language.
loadA logical value indicating whether the document
corpus should be immediately loaded into memory.
idA character vector representing a unique identification
string for the returned text document.
The function returns a PlainTextDocument representing the text
and meta data in content.
Details
Formally this function is a function generator, i.e., it returns a
function (which reads in a text document) with a well-defined
signature, but can access passed over arguments via lexical
scoping. This is especially useful for reader functions for complex
data structures which need a lot of configuration options.
Note that this PDF reader needs both the tools pdftotext and
pdfinfo installed and accessable on your system.
See Also
Use getReaders to list available reader functions.