nchar
takes a character vector as an argument and
returns a vector whose elements contain the sizes of
the corresponding elements of x
. Internally, it is a generic,
for which methods can be defined.
nzchar
is a fast way to find out if elements of a character
vector are non-empty strings.
nchar(x, type = "chars", allowNA = FALSE, keepNA = NA)nzchar(x, keepNA = FALSE)
character vector, or a vector to be coerced to a character vector. Giving a factor is an error.
character string: partial matching to one of
c("bytes", "chars", "width")
. See ‘Details’.
logical: should NA
be returned for invalid
multibyte strings or "bytes"
-encoded strings (rather than
throwing an error)?
logical: should NA
be returned where ever
x
is NA
? If false, nchar()
returns
2
, as that is the number of printing characters used when
strings are written to output, and nzchar()
is TRUE
. The
default for nchar()
, NA
, means to use keepNA = TRUE
unless type
is "width"
. Used to be (implicitly) hard
coded to FALSE
in R versions
For nchar
, an integer vector giving the sizes of each element.
For missing values (i.e., NA
, i.e., NA_character_
),
nchar()
returns NA_integer_
if keepNA
is
true, and 2
, the number of printing characters, if false.
type = "width"
gives (an approximation to) the number of
columns used in printing each element in a terminal font, taking into
account double-width, zero-width and ‘composing’ characters.
If allowNA = TRUE
and an element is detected as invalid in a
multi-byte character set such as UTF-8, its number of characters and
the width will be NA
. Otherwise the number of characters will
be non-negative, so !is.na(nchar(x, "chars", TRUE))
is a test
of validity.
A character string marked with "bytes"
encoding (see
Encoding
) has a number of bytes, but neither a known
number of characters nor a width, so the latter two types are
NA
if allowNA = TRUE
, otherwise an error.
Names, dims and dimnames are copied from the input.
For nzchar
, a logical vector of the same length as x
,
true if and only if the element has non-zero length; if the element is
NA
, nzchar()
is true when keepNA
is false, as by
default, and NA
otherwise.
The ‘size’ of a character string can be measured in one of
three ways (corresponding to the type
argument):
bytes
The number of bytes needed to store the string (plus in C a final terminator which is not counted).
chars
The number of human-readable characters.
width
The number of columns cat
will use to
print the string in a monospaced font. The same as chars
if this cannot be calculated.
These will often be the same, and almost always will be in single-byte
locales (but note how type
determines the default for
keepNA
). There will be differences between the first two with
multibyte character sequences, e.g.in UTF-8 locales.
The internal equivalent of the default method of
as.character
is performed on x
(so there is no
method dispatch). If you want to operate on non-vector objects
passing them through deparse
first will be required.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Unicode Standard Annex #11: East Asian Width. http://www.unicode.org/reports/tr11/
strwidth
giving width of strings for plotting;
paste
, substr
, strsplit
# NOT RUN {
x <- c("asfef", "qwerty", "yuiop[", "b", "stuff.blah.yech")
nchar(x)
# 5 6 6 1 15
nchar(deparse(mean))
# 18 17 <-- unless mean differs from base::mean
x[3] <- NA; x
nchar(x, keepNA= TRUE) # 5 6 NA 1 15
nchar(x, keepNA=FALSE) # 5 6 2 1 15
stopifnot(identical(nchar(x ), nchar(x, keepNA= TRUE)),
identical(nchar(x, "w"), nchar(x, keepNA=FALSE)),
identical(is.na(x), is.na(nchar(x))))
##' nchar() for all three types :
nchars <- function(x, ...)
vapply(c("chars", "bytes", "width"),
function(tp) nchar(x, tp, ...), integer(length(x)))
nchars("\u200b") # in R versions (>= 2015-09-xx):
## chars bytes width
## 1 3 0
data.frame(x, nchars(x)) ## all three types : same unless for NA
## force the same by forcing 'keepNA':
(ncT <- nchars(x, keepNA = TRUE)) ## .... NA NA NA ....
(ncF <- nchars(x, keepNA = FALSE))## .... 2 2 2 ....
stopifnot(apply(ncT, 1, function(.) length(unique(.))) == 1,
apply(ncF, 1, function(.) length(unique(.))) == 1)
# }
Run the code above in your browser using DataLab