aaply: Split array, apply function, and return results in an array.

Description

For each slice of an array, apply function, keeping results as an array.

Usage

aaply(
  .data,
  .margins,
  .fun = NULL,
  ...,
  .expand = TRUE,
  .progress = "none",
  .inform = FALSE,
  .drop = TRUE,
  .parallel = FALSE,
  .paropts = NULL
)

Arguments

.data

matrix, array or data frame to be processed

.margins

a vector giving the subscripts to split up data by. 1 splits up by rows, 2 by columns and c(1,2) by rows and columns, and so on for higher dimensions

.fun

function to apply to each piece

...

other arguments passed on to .fun

.expand

if .data is a data frame, should output be 1d (expand = FALSE), with an element for each row; or nd (expand = TRUE), with a dimension for each variable.

.progress

name of the progress bar to use, see create_progress_bar

.inform

produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging

.drop

should extra dimensions of length 1 in the output be dropped, simplifying the output. Defaults to TRUE

.parallel

if TRUE, apply function in parallel, using parallel backend provided by foreach

.paropts

a list of additional options passed into the foreach function when parallel computation is enabled. This is important if (for example) your code relies on external data or packages: use the .export and .packages arguments to supply them so that all cluster nodes have the correct environment set up for computing.

Value

if results are atomic with same type and dimensionality, a vector, matrix or array; otherwise, a list-array (a list with dimensions)

Warning

Passing a data frame as first argument may lead to unexpected results, see https://github.com/hadley/plyr/issues/212.

Input

This function splits matrices, arrays and data frames by dimensions

Output

If there are no results, then this function will return a vector of length 0 (vector()).

Details

This function is very similar to apply, except that it will always return an array, and when the function returns >1 d data structures, those dimensions are added on to the highest dimensions, rather than the lowest dimensions. This makes aaply idempotent, so that aaply(input, X, identity) is equivalent to aperm(input, X).

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.

Examples

Run this code

# NOT RUN {
dim(ozone)
aaply(ozone, 1, mean)
aaply(ozone, 1, mean, .drop = FALSE)
aaply(ozone, 3, mean)
aaply(ozone, c(1,2), mean)

dim(aaply(ozone, c(1,2), mean))
dim(aaply(ozone, c(1,2), mean, .drop = FALSE))

aaply(ozone, 1, each(min, max))
aaply(ozone, 3, each(min, max))

standardise <- function(x) (x - min(x)) / (max(x) - min(x))
aaply(ozone, 3, standardise)
aaply(ozone, 1:2, standardise)

aaply(ozone, 1:2, diff)
# }

Run the code above in your browser using DataLab