mut_utf8_locale: Set the locale's codeset for testing

Description

Setting a locale's codeset (specifically, the LC_CTYPE category) produces side effects in R's handling of strings. The most important of these affects how the R parser marks strings. R has specific internal support for latin1 (single-byte encoding) and UTF-8 (multi-bytes variable-width encoding) strings. If the locale codeset is latin1 or UTF-8, the parser will mark all strings with the corresponding encoding. It is important for strings to have consistent encoding markers, as they determine a number of internal encoding conversions when R or packages handle strings (see set_str_encoding() for some examples).

Usage

mut_utf8_locale()
mut_latin1_locale()
mut_mbcs_locale()

Arguments

Value

The previous locale (invisibly).

Life cycle

These functions are experimental. They might be removed in the future because they don't bring anything new over the base API.

Details

If you are changing the locale encoding for testing purposes, you need to be aware that R caches strings and symbols to save memory. If you change the locale during an R session, it can lead to surprising and difficult to reproduce results. In doubt, restart your R session.

Note that these helpers are only provided for testing interactively the effects of changing locale codeset. They let you quickly change the default text encoding to latin1, UTF-8, or non-UTF-8 MBCS. They are not widely tested and do not provide a way of setting the language and region of the locale. They have permanent side effects and should probably not be used in package examples, unit tests, or in the course of a data analysis. Note finally that mut_utf8_locale() will not work on Windows as only latin1 and MBCS locales are supported on this OS.