Learn R Programming

RDSTK (version 1.1)

html2text: Identifies the text of an html string

Description

This function is used for processing an html string in order to find the main text of this string. The output is a list that contains the extracted text.

Usage

html2text(html, session=getCurlHandle())

Arguments

html
A string containing valid html code.
session
This is the CURLHandle object giving the structure for the options and that will process the command. For curlMultiPerform, this is an object of class code MultiCURLHandle-class.

Value

A list with the main text in the html.

References

http://www.datasciencetoolkit.org/developerdocs#html2text

See Also

curlPerform, getCurlHandle, dynCurlReader

Examples

Run this code
	## Not run: 
# 		html <- '<html><head><title>MyTitle</title></head><body><script
# 		 type="text/javascript">something();</script><div>Some actual
# 		 text</div></body></html>'
# 		html2text(html)
# 	## End(Not run)

Run the code above in your browser using DataLab