Learn R Programming

plyr (version 1.5.2)

ddply: Split data frame, apply function, and return results in a data frame.

Description

Split data frame, apply function, and return results in a data frame. For each subset of a data frame, apply function then combine results into a data frame

Usage

ddply(.data, .variables, .fun, ..., .progress="none",
    .drop=TRUE, .parallel=FALSE)

Arguments

.data
data frame to be processed
.variables
variables to split data frame by, as quoted variables, a formula or character vector
.fun
function to apply to each piece
.drop
should combinations of variables that do not appear in the data be preserved (FALSE) or dropped (TRUE, default)
...
other arguments passed on to .fun
.progress
name of the progress bar to use, see create_progress_bar
.parallel
if TRUE, apply function in parallel, using parallel backend provided by foreach

Value

  • a data frame

Details

All plyr functions use the same split-apply-combine strategy: they split the input into simpler pieces, apply .fun to each piece, and then combine the pieces into a single data structure. This function splits data frames by variables and combines the result into a data frame. If there are no results, then this function will return a data frame with zero rows and columns (data.frame()).

The most unambiguous behaviour is achieved when .fun returns a data frame - in that case pieces will be combined with rbind.fill. If .fun returns an atomic vector of fixed length, it will be rbinded together and converted to a data frame. Any other values will result in an error.

References

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.

Examples

Run this code
ddply(baseball, .(year), "nrow") 
ddply(baseball, .(lg), c("nrow", "ncol")) 

rbi <- ddply(baseball, .(year), summarise, 
mean_rbi = mean(rbi, na.rm = TRUE))
with(rbi, plot(year, mean_rbi, type="l"))

base2 <- ddply(baseball, .(id), transform, 
career_year = year - min(year) + 1
)

Run the code above in your browser using DataLab