Unicode defines a name and a number of all of the glyphs it
encompasses: the numbers are called code points: they run from
0
to 0x10FFFF
.
utf8ToInt
converts a length-one character string encoded in
UTF-8 to an integer vector of Unicode code points. It checks validity
of the input and returns NA
if it is invalid.
intToUtf8
converts a numeric vector of Unicode code points
either to a single character string or a character vector of single
characters. (For a single character string 0
is silently
omitted: otherwise 0
is mapped to ""
. Non-integral
numeric values are truncated to integers.) The Encoding
is declared as "UTF-8"
.
NA
inputs are mapped to NA
output.