Last chance! 50% off unlimited learning
Sale ends in
Apply a function to each cell of a ragged array, that is to each (non-empty) group of values given by a unique combination of the levels of certain factors.
tapply(X, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)
a function (or name of a function) to be applied, or NULL
.
In the case of functions like +
, %*%
, etc.,
the function name must be backquoted or quoted. If FUN
is
NULL
, tapply returns a vector which can be used to subscript
the multi-way array tapply
normally produces.
optional arguments to FUN
: the Note section.
(only in the case of simplification to an array) the
value with which the array is initialized as
array(default, dim = ..)
. Before R 3.4.0, this
was hard coded to array()
's default NA
. If it
is NA
(the default), the missing value of the answer type,
e.g. NA_real_
, is chosen (as.raw(0)
for
"raw"
). In a numerical case, it may be set, e.g., to
FUN(integer(0))
, e.g., in the case of FUN = sum
to
0
or 0L
.
When FUN
is present, tapply
calls FUN
for each
cell that has any data in it. If FUN
returns a single atomic
value for each such cell (e.g., functions mean
or var
)
and when simplify
is TRUE
, tapply
returns a
multi-way array containing the values, and NA
for the
empty cells. The array has the same number of dimensions as
INDEX
has components; the number of levels in a dimension is
the number of levels (nlevels()
) in the corresponding component
of INDEX
. Note that if the return value has a class (e.g., an
object of class "Date"
) the class is discarded.
simplify = TRUE
always returns an array, possibly 1-dimensional.
If FUN
does not return a single atomic value, tapply
returns an array of mode list
whose components are the
values of the individual calls to FUN
, i.e., the result is a
list with a dim
attribute.
When there is an array answer, its dimnames
are named by
the names of INDEX
and are based on the levels of the grouping
factors (possibly after coercion).
For a list result, the elements corresponding to empty cells are
NULL
.
If FUN
is not NULL
, it is passed to
match.fun
, and hence it can be a function or a symbol or
character string naming a function.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
the convenience functions by
and
aggregate
(using tapply
);
apply
,
lapply
with its versions
sapply
and mapply
.
# NOT RUN {
require(stats)
groups <- as.factor(rbinom(32, n = 5, prob = 0.4))
tapply(groups, groups, length) #- is almost the same as
table(groups)
## contingency table from data.frame : array with named dimnames
tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
tapply(warpbreaks$breaks, warpbreaks[, 3, drop = FALSE], sum)
n <- 17; fac <- factor(rep_len(1:3, n), levels = 1:5)
table(fac)
tapply(1:n, fac, sum)
tapply(1:n, fac, sum, default = 0) # maybe more desirable
tapply(1:n, fac, sum, simplify = FALSE)
tapply(1:n, fac, range)
tapply(1:n, fac, quantile)
tapply(1:n, fac, length) ## NA's
tapply(1:n, fac, length, default = 0) # == table(fac)
# }
# NOT RUN {
## example of ... argument: find quarterly means
tapply(presidents, cycle(presidents), mean, na.rm = TRUE)
ind <- list(c(1, 2, 2), c("A", "A", "B"))
table(ind)
tapply(1:3, ind) #-> the split vector
tapply(1:3, ind, sum)
## Some assertions (not held by all patch propsals):
nq <- names(quantile(1:5))
stopifnot(
identical(tapply(1:3, ind), c(1L, 2L, 4L)),
identical(tapply(1:3, ind, sum),
matrix(c(1L, 2L, NA, 3L), 2, dimnames = list(c("1", "2"), c("A", "B")))),
identical(tapply(1:n, fac, quantile)[-1],
array(list(`2` = structure(c(2, 5.75, 9.5, 13.25, 17), .Names = nq),
`3` = structure(c(3, 6, 9, 12, 15), .Names = nq),
`4` = NULL, `5` = NULL), dim=4, dimnames=list(as.character(2:5)))))
# }
Run the code above in your browser using DataLab