Usage
getHTMLLinks(doc, externalOnly = TRUE, xpQuery = "//a/@href", baseURL = docName(doc), relative = FALSE)
getHTMLExternalFiles(doc, xpQuery = c("//img/@src", "//link/@href", "//script/@href", "//embed/@src"), baseURL = docName(doc), relative = FALSE, asNodes = FALSE, recursive = FALSE)
Arguments
doc
the HTML document as a URL, local file name, parsed
document or an XML/HTML node
externalOnly
a logical value that indicates whether we should
only return links to external documents and not references to
internal anchors/nodes within this document, i.e. those that of the
form #foo
.
xpQuery
a vector of XPath elements which match the elements of interest
baseURL
the URL of the container document. This is used
to resolve relative references/links.
relative
a logical value indicating whether to leave the
references as relative to the base URL or to expand them to their full paths.
asNodes
a logical value that indicates whether we want the actual
HTML/XML nodes in the document that reference external documents
or just the names of the external documents.
recursive
a logical value that controls whether we recursively
process the external documents we find in the top-level document
examining them for their external files.