stri_enc_mark(str)
str
.
Unlike in Encoding
, possible encodings are:
ASCII
, latin1
, bytes
, native
,
and UTF-8
. Additionally, missing values are handled properly. This is exactly the same information which is used by
all the functions in stringi to re-encode their inputs.Encoding
,
R has a simple encoding marking mechanism:
strings can be declared to be in latin1
,
UTF-8
or bytes
. Moreover, via the C API we may check whether
a string is in ASCII (R assumes that this holds if and only if
all bytes in a string are not greater than 127,
so there is an implicit assumption that your platform uses
an encoding which is an ASCII superset)
or in the system's default (a.k.a. unknown
in Encoding
)
encoding. Intuitively, the default encoding should be equivalent to
the one you use when inputting data via keyboard.
In stringi
we assume that such an encoding
is equivalent to the one returned by stri_enc_get
.
It is automatically detected by ICU
to match -- by default -- the encoding part of the LC_CTYPE
category
as given by Sys.getlocale
.stri_enc_info
,
stri_enc_list
, stri_enc_set
,
stringi-encoding