
stri_enc_isutf8(str)
raw
vectors(c4,85)
properly
represents ("Polish a with ogonek") in UTF-8
as well as ("A umlaut", "Ellipsis") in WINDOWS-1250.
Also note that UTF-8, as well as most 8-bit encodings,
have ASCII as their subsets
(note that stri_enc_isascii
=> stri_enc_isutf8
).However, the longer the sequence, the bigger the possibility that the result is indeed in UTF-8 -- this is because not all sequences of bytes are valid UTF-8.
This function is independent of the way Rmarks encodings in character strings (see Encoding and stringi-encoding).
stri_enc_detect2
;
stri_enc_detect
;
stri_enc_isascii
;
stri_enc_isutf16be
,
stri_enc_isutf16le
,
stri_enc_isutf32be
,
stri_enc_isutf32le
;
stringi-encoding
if (stri_install_check(silent=TRUE))
stri_enc_isutf8(letters[1:3])
if (stri_install_check(silent=TRUE))
stri_enc_isutf8("\u0105\u0104")
if (stri_install_check(silent=TRUE))
stri_enc_isutf8("\u1234\u0222")
Run the code above in your browser using DataLab