Learn R Programming

tm (version 0.5-9.1)

readXML: Read In an XML Document

Description

Return a function which reads in an XML document. The structure of the XML document can be described with a so-called specification.

Usage

readXML(spec, doc, ...)

Arguments

spec
A named list of lists each containing two character vectors. The constructed reader will map each list entry to a attribute or meta datum corresponding to the named list entry. Valid names include C
doc
An (empty) document of some subclass of TextDocument
...
Arguments for the generator function.

Value

  • A function with the signature elem, language, id:
  • elemA list with the named element content which must hold the document to be read in.
  • languageA character vector giving the text's language.
  • idA character vector representing a unique identification string for the returned text document.
  • The function returns doc augmented by the parsed information out of the XML file as described by spec.

Details

Formally this function is a function generator, i.e., it returns a function (which reads in a text document) with a well-defined signature, but can access passed over arguments (e.g., the specification) via lexical scoping.

See Also

Vignette 'Extensions: How to Handle Custom File Formats'.

getReaders to list available reader functions.

Examples

Run this code
readReut21578XML <- readXML(
  spec = list(Author = list("node", "/REUTERS/TEXT/AUTHOR"),
              DateTimeStamp = list("function", function(node)
                strptime(sapply(XML::getNodeSet(node, "/REUTERS/DATE"),
                                XML::xmlValue),
                         format = "                         tz = "GMT")),
              Description = list("unevaluated", ""),
              Heading = list("node", "/REUTERS/TEXT/TITLE"),
              ID = list("attribute", "/REUTERS/@NEWID"),
              Origin = list("unevaluated", "Reuters-21578 XML"),
              Topics = list("node", "/REUTERS/TOPICS/D")),
  doc = Reuters21578Document())

Run the code above in your browser using DataLab