UTF-8 Text Processing
Patrick O. Perry
Functions for manipulating and printing UTF-8 encoded text:
as_utf8
attempts to convert character data to UTF-8,
throwing an error if the data is invalid;
utf8_valid
tests whether character data is valid
according to its declared encoding;
utf8_normalize
converts text to Unicode composed
normal form (NFC), optionally applying case-folding and compatibility
maps;
utf8_encode
encodes a character string, escaping all
control characters, so that it can be safely printed to the screen;
utf8_format
formats a character vector by truncating
to a specified character width limit or by left, right, or center
justifying;
utf8_print
prints UTF-8 character data to the screen;
utf8_width
measures the display width of UTF-8 character
strings (many emoji and East Asian characters are twice as wide as other
characters);
output_ansi
and output_utf8
test for
the output connections capabilities.
For a complete list of functions, use library(help = "utf8")
.