Learn R Programming

textutils (version 0.4-1)

HTMLencode: Decode and Encode HTML Entities

Description

Decode and encode HTML entities.

Usage

HTMLdecode(x, named = TRUE, hex = TRUE, decimal = TRUE)
HTMLencode(x, use.iconv = FALSE, encode.only = NULL)
HTMLrm(x, ...)

Value

character

Arguments

x

HTMLdecode, HTMLencode: a character vector of length one; for HTMLrm: a character vector

use.iconv

logical. Should conversion via iconv be tried from native encoding to UTF-8?

named

logical: replace named character references?

hex

logical: replace hexadecimal character references?

decimal

logical: replace decimal character references?

encode.only

character

...

other arguments

Author

Enrico Schumann

Details

HTMLdecode replaces named, hexadecimal and decimal character references as defined by HTML5 (see References) with characters. The resulting character vector is marked as UTF-8 (see Encoding).

HTMLencode replaces UTF-8-encoded substrings with HTML5 named entities (a.k.a. “named character references”). A semicolon ‘;’ will not be replaced by the entity ‘;’. Other than that, however, HTMLencode is quite thorough in its job: it will replace all characters for which named entities exists, even ‘,’ and or ‘?’. You can restrict the characters to be replaced by specifying encode.only.

HTMLrm removes HTML tags. All content between style and head tags is removed, as are comments. Note that each element of x is considered a single HTML document; so for multiline documents, paste/collapse the document.

References

https://www.w3.org/TR/html5/syntax.html#named-character-references

https://html.spec.whatwg.org/multipage/syntax.html#character-references

See Also

TeXencode

Examples

Run this code
HTMLdecode(c("Max & Moritz", "4 < 9"))
## [1] "Max & Moritz" "4 < 9"

HTMLencode(c("Max & Moritz", "4 < 9"))
## [1] "Max & Moritz" "4 < 9"

HTMLencode("Max, Moritz & more")
## [1] "Max, Moritz & more"
HTMLencode("Max, Moritz & more", encode.only = c("&", "<", ">"))
## [1] "Max, Moritz & more"


HTMLrm("before LINK  after")
## [1] "before http://enricoschumann.net  after"

Run the code above in your browser using DataLab