Learn R Programming

stringi (version 1.8.4)

stri_length: Count the Number of Code Points

Description

This function returns the number of code points in each string.

Usage

stri_length(str)

Value

Returns an integer vector of the same length as str.

Arguments

str

character vector or an object coercible to

Author

Marek Gagolewski and other contributors

Details

Note that the number of code points is not the same as the `width` of the string when printed on the console.

If a given string is in UTF-8 and has not been properly normalized (e.g., by stri_trans_nfc), the returned counts may sometimes be misleading. See stri_count_boundaries for a method to count Unicode characters. Moreover, if an incorrect UTF-8 byte sequence is detected, then a warning is generated and the corresponding output element is set to NA, see also stri_enc_toutf8 for a method to deal with such cases.

Missing values are handled properly. For `byte` encodings we get, as usual, an error.

See Also

The official online manual of stringi at https://stringi.gagolewski.com/

Gagolewski M., stringi: Fast and portable character string processing in R, Journal of Statistical Software 103(2), 2022, 1-59, tools:::Rd_expr_doi("10.18637/jss.v103.i02")

Other length: %s$%(), stri_isempty(), stri_numbytes(), stri_pad_both(), stri_sprintf(), stri_width()

Examples

Run this code
stri_length(LETTERS)
stri_length(c('abc', '123', '\u0105\u0104'))
stri_length('\u0105') # length is one, but...
stri_numbytes('\u0105') # 2 bytes are used
stri_numbytes(stri_trans_nfkd('\u0105')) # 3 bytes here but...
stri_length(stri_trans_nfkd('\u0105')) # ...two code points (!)
stri_count_boundaries(stri_trans_nfkd('\u0105'), type='character') # ...and one Unicode character

Run the code above in your browser using DataLab