A convenience function to tune the ICU Collator's behavior,
e.g., in stri_compare
, stri_order
,
stri_unique
, stri_duplicated
,
as well as stri_detect_coll
and other stringi-search-coll functions.
stri_opts_collator(
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalization = FALSE,
normalisation = normalization,
numeric = FALSE,
...
)stri_coll(
locale = NULL,
strength = 3L,
alternate_shifted = FALSE,
french = FALSE,
uppercase_first = NA,
case_level = FALSE,
normalization = FALSE,
normalisation = normalization,
numeric = FALSE,
...
)
single string, NULL
or
''
for default locale
single integer in {1,2,3,4}, which defines collation strength;
1
for the most permissive collation rules, 4
for the strictest
ones
single logical value; FALSE
treats all the code points with non-ignorable primary weights in the same way,
TRUE
causes code points with primary weights that are equal or below
the variable top value to be ignored on primary level and moved to the quaternary level
single logical value; used in Canadian French;
TRUE
results in secondary weights being considered backwards
single logical value; NA
orders upper and lower case letters in accordance to their tertiary weights,
TRUE
forces upper case letters to sort before lower case letters,
FALSE
does the opposite
single logical value; controls whether an extra case level (positioned before the third level) is generated or not
single logical value; if TRUE
,
then incremental check is performed to see whether the input data is in
the FCD form. If the data is not in the FCD form, incremental NFD
normalization is performed
alias of normalization
single logical value; when turned on, this attribute generates a collation key for the numeric value of substrings of digits; this is a way to get '100' to sort AFTER '2'
[DEPRECATED] any other arguments passed to this function generate a warning; this argument will be removed in the future
Returns a named list object; missing settings are left with default values.
ICU's collator performs a locale-aware, natural-language alike string comparison. This is a more reliable way of establishing relationships between strings than the one provided by base R, and definitely one that is more complex and appropriate than ordinary bytewise comparison.
Collation -- ICU User Guide, http://userguide.icu-project.org/collation
ICU Collation Service Architecture -- ICU User Guide, http://userguide.icu-project.org/collation/architecture
icu::Collator
Class Reference -- ICU4C API Documentation,
https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/classicu_1_1Collator.html
Other locale_sensitive:
%s<%()
,
about_locale
,
about_search_boundaries
,
about_search_coll
,
stri_compare()
,
stri_count_boundaries()
,
stri_duplicated()
,
stri_enc_detect2()
,
stri_extract_all_boundaries()
,
stri_locate_all_boundaries()
,
stri_order()
,
stri_sort_key()
,
stri_sort()
,
stri_split_boundaries()
,
stri_trans_tolower()
,
stri_unique()
,
stri_wrap()
Other search_coll:
about_search_coll
,
about_search
# NOT RUN {
stri_cmp('number100', 'number2')
stri_cmp('number100', 'number2', opts_collator=stri_opts_collator(numeric=TRUE))
stri_cmp('number100', 'number2', numeric=TRUE) # equivalent
stri_cmp('above mentioned', 'above-mentioned')
stri_cmp('above mentioned', 'above-mentioned', alternate_shifted=TRUE)
# }
Run the code above in your browser using DataLab