The binary comparison operators are generic functions: methods can be
written for them individually or via the
Ops
) group generic function. (See
Ops
for how dispatch is computed.) Comparison of strings in character vectors is lexicographic within the
strings using the collating sequence of the locale in use: see
locales
. The collating sequence of locales such as
en_US is normally different from C (which should use
ASCII) and can be surprising. Beware of making any assumptions
about the collation order: e.g. in Estonian Z
comes between
S
and T
, and collation is not necessarily
character-by-character -- in Danish aa
sorts as a single
letter, after z
. In Welsh ng
may or may not be a single
sorting unit: if it is it follows g
. Some platforms may
not respect the locale and always sort in numerical order of the bytes
in an 8-bit locale, or in Unicode code-point order for a UTF-8 locale (and
may not sort in the same order for the same language in different
character sets). Collation of non-letters (spaces, punctuation signs,
hyphens, fractions and so on) is even more problematic.
Character strings can be compared with different marked encodings
(see Encoding
): they are translated to UTF-8 before
comparison.
At least one of x
and y
must be an atomic vector, but if
the other is a list R attempts to coerce it to the type of the atomic
vector: this will succeed if the list is made up of elements of length
one that can be coerced to the correct type.
If the two arguments are atomic vectors of different types, one is
coerced to the type of the other, the (decreasing) order of precedence
being character, complex, numeric, integer, logical and raw.
Missing values (NA
) and NaN
values are
regarded as non-comparable even to themselves, so comparisons
involving them will always result in NA
. Missing values can
also result when character strings are compared and one is not valid
in the current collation locale.
Language objects such as symbols and calls are deparsed to
character strings before comparison.