xmlEventParse: XML Event/Callback element-wise Parser

Description

Reads and processes the contents of an XML file or string by invoking user-level functions associated with different components of the XML tree. These include beginning and end of XML elements, comments, CDATA (escaped character data), entities, processing instructions, etc. This allows the caller to create the appropriate data structure from the XML document contents rather than the default tree (see xmlTreeParse). Functions for specific tags/elements can be used in addition to the standard callback names.

Usage

xmlEventParse(file, handlers=xmlHandler(), ignoreBlanks, addContext=T, useTagName=F, asText =F, trim=T, useExpat=F, isURL = F)

Arguments

file

string identifying the file that is interpreted using the internal expansion mechanism so it can contain ~ and other environment variables. As with xmlTreeParse, if useExpat is false, th

handlers

a closure object that contains functions which will be invoked as the XML components in the document are encountered by the parser. The standard functions are startElement(), endElement() comment(), externa

ignoreBlanks

logical value indicating whether text elements made up entirely of white space should be included in the resulting `tree'.

addContext

logical value indicating whether the callback functions in `handlers' should be invoked with contextual information about the parser and the position in the tree, such as node depth, path indices for the node relative the root, etc. If this is True, ea

useTagName

logical value indicating whether the callback mechanism should look for a function matching the tag name in the startElement and endElement events, before calling the default handler functions. This allows the caller to handle different element types

asText

logical value indicating that the first argument, `file', should be treated as the XML text to parse, not the name of a file. This allows the contents of documents to be retrieved from different sources (e.g. HTTP servers, XML-RPC, e

trim

whether to strip white space from the beginning and end of text strings.

useExpat

a logical value indicating whether to use the expat SAX parser, or to default to the libxml. If this is TRUE, the library must have been compiled with support for expat. See supportsExpat.

isURL

indicates whether the file argument refers to a URL (accessible via ftp or http) or a regular file on the system. If asText is TRUE, this should not be specified.

Value

The return value is the `handlers' argument. It is assumed that this is a closure and that the callback functions have manipulated variables local to it and that the caller knows how to extract this.

Notes

The libxml parser can read URLs via http or ftp. It does not require the support of wget as used in other parts of R, but uses its own facilities to connect to remote servers.

Details

This is implemented via the Expat XML parser by Jim Clark (http://www.jclark.com).

References

http://www.w3.org/XML, http://www.jclark.com/xml

Examples

Run this code

fileName <- system.file("data/mtcars.xml", pkg="XML")

   # Print the name of each XML tag encountered at the beginning of each
   # tag.
   # Uses the libxml SAX parser.
 xmlEventParse(fileName, list(startElement=function(name, attrs){cat(name,"")}), useTagName=F, addContext = F)


# Parse the text rather than a file or URL by reading the URL's contents
  # and making it a single string. Then call xmlEventParse
xmlURL <- "http://www.omegahat.org/Scripts/Data/mtcars.xml"
xmlText <- paste(scan.url(xmlURL, what="",sep="\n"),"\n",collapse="\n")
xmlEventParse(xmlText, asText=T)

Run the code above in your browser using DataLab