Learn R Programming

plyr (version 1.5.2)

daply: Split data frame, apply function, and return results in an array.

Description

Split data frame, apply function, and return results in an array. For each subset of data frame, apply function then combine results into an array

Usage

daply(.data, .variables, .fun, ..., .progress="none",
    .drop=TRUE, .parallel=FALSE)

Arguments

.data
data frame to be processed
.variables
variables to split data frame by, as quoted variables, a formula or character vector
.fun
function to apply to each piece
...
other arguments passed on to .fun
.progress
name of the progress bar to use, see create_progress_bar
.drop
should extra dimensions of length 1 be dropped, simplifying the output. Defaults to TRUE
.parallel
if TRUE, apply function in parallel, using parallel backend provided by foreach

Value

  • if results are atomic with same type and dimensionality, a vector, matrix or array; otherwise, a list-array (a list with dimensions)

Details

All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure. This function splits data frames by variable and combines the result into an array. If there are no results, then this function will return a vector of length 0 (vector()).

daply with a function that operates column-wise is similar to aggregate.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.

Examples

Run this code
daply(baseball, .(year), nrow)

# Several different ways of summarising by variables that should not be 
# included in the summary

daply(baseball[, c(2, 6:9)], .(year), mean)
daply(baseball[, 6:9], .(baseball$year), mean)
daply(baseball, .(year), function(df) mean(df[, 6:9]))

Run the code above in your browser using DataLab