Learn R Programming

stringx (version 0.2.9)

xtfrm2: Sort Strings

Description

The sort method for objects of class character (sort.character) uses the locale-sensitive Unicode collation algorithm to arrange strings in a vector with regards to a chosen lexicographic order.

xtfrm2 and [DEPRECATED] xtfrm generate an integer vector that sort in the same way as its input, and hence can be used in conjunction with order or rank.

Usage

xtfrm2(x, ...)

# S3 method for default xtfrm2(x, ...)

# S3 method for character xtfrm2( x, ..., locale = NULL, strength = 3L, alternate_shifted = FALSE, french = FALSE, uppercase_first = NA, case_level = FALSE, normalisation = FALSE, numeric = FALSE )

xtfrm(x)

# S3 method for default xtfrm(x)

# S3 method for character xtfrm(x)

# S3 method for character sort( x, ..., decreasing = FALSE, na.last = NA, locale = NULL, strength = 3L, alternate_shifted = FALSE, french = FALSE, uppercase_first = NA, case_level = FALSE, normalisation = FALSE, numeric = FALSE )

Value

sort.character returns a character vector, with only the names attribute preserved. Note that the output vector may be shorter than the input one.

xtfrm2.character and xtfrm.character return an integer vector; most attributes are preserved.

Arguments

x

character vector whose elements are to be sorted

...

further arguments passed to other methods

locale

NULL or "" for the default locale (see stri_locale_get) or a single string with a locale identifier, see stri_locale_list

strength

see stri_opts_collator

alternate_shifted

see stri_opts_collator

french

see stri_opts_collator

uppercase_first

see stri_opts_collator

case_level

see stri_opts_collator

normalisation

see stri_opts_collator

numeric

see stri_opts_collator

decreasing

single logical value; if FALSE, the ordering is nondecreasing (weakly increasing)

na.last

single logical value; if TRUE, then missing values are placed at the end; if FALSE, they are put at the beginning; if NA, then they are removed from the output whatsoever.

Differences from Base R

Replacements for the default S3 methods sort and xtfrm for character vectors implemented with stri_sort and stri_rank.

  • Collation in different locales is difficult and non-portable across platforms [fixed here -- using services provided by ICU]

  • Overloading xtfrm.character has no effect in R, because S3 method dispatch is done internally with hard-coded support for character arguments. Thus, we needed to replace the generic xtfrm with the one that calls UseMethod [fixed here]

  • xtfrm does not support customisation of the linear ordering relation it is based upon [fixed by introducing ... argument to the new generic, xtfrm2]

  • Neither order, rank, nor sort.list is a generic, therefore they should have to be rewritten from scratch to allow the inclusion of our patches; interestingly, order even calls xtfrm, but only for classed objects [not fixed here -- see Examples for a workaround]

  • xtfrm for objects of type character does not preserve the names attribute (but does so for numeric) [fixed here]

  • sort seems to preserve only the names attribute which makes sense if na.last is NA, because the resulting vector might be shorter [not fixed here as it would break compatibility with other sorting methods]

  • Note that sort by default removes missing values whatsoever, whereas order has na.last=TRUE [not fixed here as it would break compatibility with other sorting methods]

Details

What 'xtfrm' stands for the current author does not know, but would appreciate someone's enlightening him.

See Also

The official online manual of stringx at https://stringx.gagolewski.com/

Related function(s): strcoll

Examples

Run this code
x <- c("a1", "a100", "a101", "a1000", "a10", "a10", "a11", "a99", "a10", "a1")
base::sort.default(x)   # lexicographic sort
sort(x, numeric=TRUE)   # calls stringx:::sort.character
xtfrm2(x, numeric=TRUE)  # calls stringx:::xtfrm2.character

rank(xtfrm2(x, numeric=TRUE), ties.method="average")  # ranks with averaged ties
order(xtfrm2(x, numeric=TRUE))    # ordering permutation
x[order(xtfrm2(x, numeric=TRUE))] # equivalent to sort()

# order a data frame w.r.t. decreasing ids and increasing vals
d <- data.frame(vals=round(runif(length(x)), 1), ids=x)
d[order(-xtfrm2(d[["ids"]], numeric=TRUE), d[["vals"]]), ]


Run the code above in your browser using DataLab