icuSetCollate(...)
icuGetCollate(type = c("actual", "valid"))
icuGetCollate
, a character string describing the ICU locale
in use (which may be reported as "ICU not in use"
). The
actual locale may be simpler than the requested locale: for
example "da"
rather than "da_DK"
: English locales are
likely to report "root"
.
icuSetCollate
can be used to tune the way collation is done.
On other builds calling this function does nothing, with a warning.Possible arguments are
locale
:"da_DK"
giving the language and country whose collation rules are to be
used. If present, this should be the first argument.
case_first
:"upper"
, "lower"
or
"default"
, asking for upper- or lower-case characters to be
sorted first. The default is usually lower-case first, but not in
all languages (not under the default settings for Danish, for example).
alternate_handling
:"non_ignorable"
(primary strength) and
"shifted"
(quaternary strength).
strength
:"primary"
, "secondary"
, "tertiary"
(default), "quaternary"
and "identical"
.
french_collation
:"on"
, "off"
and "default"
.
normalization
:"on"
and "off"
(default). This affects the
collation of composite characters.
case_level
:"on"
and "off"
(default).
hiragana_quaternary
:"on"
(sort
Hiragana first at quaternary level) and "off"
.
Only the first three are likely to be of interest except to those with a detailed understanding of collation and specialized requirements.
Some special values are accepted for locale
:
"none"
:
"ASCII"
:strcmp
is used instead, which should sort byte-by-byte in
(unsigned) numerical order. (As from R 3.1.3.)
"default"
:
""
, "root"
:
For the specifications of real ICU locales, see
http://userguide.icu-project.org/locale. Note that ICU does not
report that a locale is not supported, but falls back to its idea of
best fit (which could be rather different and is reported by
icuGetCollate("actual")
, often "root"
). Most English
locales fall back to "root"
as although e.g.\ifelse{latex}{\out{~}}{ } "en_GB"
is
a valid locale (at least on some platforms), it contains no special
rules for collation. Note that "C"
is not a supported ICU locale.
Some examples are case_level = "on", strength = "primary"
to ignore
accent differences and alternate_handling = "shifted"
to ignore
space and punctuation characters.
Initially ICU will not be used for collation if the OS is set to use
the C
locale for collation. Once this function is called with
a value for locale
, ICU will be used until it is called again
with locale = "none"
.
All customizations are reset to the default for the locale if
locale
is specified: the collation engine is reset if the
OS collation locate category is changed by Sys.setlocale
.
sort
. capabilities
for whether ICU is available;
extSoftVersion
for its version.
The ICU user guide chapter on collation (http://userguide.icu-project.org/collation).