u_char_names: Unicode Character Names

Description

Find the names or labels of Unicode characters, or Unicode characters by their name.

Usage

u_char_name(x)
u_char_from_name(x, type = c("exact", "grep"), ...)
u_char_label(x)

Value

For u_char_name and u_char_label, a character vector with the names or labels, respectively, of the corresponding Unicode characters.

For u_char_from_name, a u_char object giving the Unicode characters with name exactly matching the given names.

Arguments

x: an R object which can be coerced to a u_char vector of Unicode characters via as.u_char for u_char_name and u_char_label; a character vector otherwise.
type: one of "exact" or "grep", or an abbreviation thereof.
...: arguments to be passed to grepl when using this for pattern matching.

Details

The Unicode Standard provides a convention for labeling code points that do not have character names (control, reserved, noncharacter, private-use and surrogate code points). These labels can be obtained by u_char_label.

By default, exact matching is used for finding Unicode characters by name. When type = "grep", grepl is used for matching x against the Unicode character names; for now, Hangul syllable and CJK Unified Ideograph names are ignored in this case.

Examples

Run this code

x <- as.u_char(utf8ToInt("Austria"))
u_char_name(x)

## Derived Hangul syllable character names are also supported for
## finding characters by exact matching:
x <- u_char_name("0xAC00")
x
u_char_from_name(x)

## Find all Unicode characters with name matching 'DIGIT ONE'.
x <- u_char_from_name("\\bDIGIT ONE\\b", "g")
## And show their names.
u_char_name(x)

Run the code above in your browser using DataLab