Learn R Programming

tm.plugin.webmining (version 1.3)

extractHTMLStrip: Simply strip HTML Tags from Document

Description

extractHTMLStrip parses an url, character or filename, reads the DOM tree, removes all HTML tags in the tree and outputs the source text without markup.

Usage

extractHTMLStrip(url, asText = TRUE, encoding, ...)

Arguments

url
character, url or filename
asText
specifies if url parameter is a character, defaults to TRUE
encoding
specifies local encoding to be used, depending on platform
...
Additional parameters for htmlTreeParse

See Also

xmlNode

htmlTreeParse encloseHTML