Unlike base::as.symbol()
and base::as.name()
, as_string()
automatically transforms unicode tags such as "<U+5E78>"
to the
proper UTF-8 character. This is important on Windows because:
R on Windows has no UTF-8 support, and uses native encoding instead.
The native encodings do not cover all Unicode characters. For
example, Western encodings do not support CKJ characters.
When a lossy UTF-8 -> native transformation occurs, uncovered
characters are transformed to an ASCII unicode tag like "<U+5E78>"
.
Symbols are always encoded in native. This means that
transforming the column names of a data frame to symbols might be
a lossy operation.
This operation is very common in the tidyverse because of data
masking APIs like dplyr where data frames are transformed to
environments. While the names of a data frame are stored as a
character vector, the bindings of environments are stored as
symbols.
Because it reencodes the ASCII unicode tags to their UTF-8
representation, the string -> symbol -> string roundtrip is
more stable with as_string()
.