utf8ToInt
converts a length-one character string encoded in
UTF-8 to an integer vector of Unicode code points. It checks validity
of the input. (Currently it accepts UTF-8 encodings of code points
greater than 0x10FFFF
: these are no longer regarded as valid by
the UTF-8 RFC and will in future be mapped to NA
. Following
‘Corrigendum 9’ the UTF-8 encodings of the
‘noncharacters’ 0xFFFE
and 0xFFFF
are regarded as
valid as from R 3.4.3.)
intToUtf8
converts a numeric vector of Unicode code points
either (default) to a single character string or a character vector of
single characters. Non-integral numeric values are truncated to
integers: values above the maximum are mapped to NA
. For a
single character string 0
is silently omitted: otherwise
0
is mapped to ""
. The Encoding
of a
non-NA
return value is declared as "UTF-8"
.
Invalid and NA
inputs are mapped to NA
output.