The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.
By design, no plyr function will preserve row names - in general it is too
hard to know what should be done with them for many of the operations
supported by plyr. If you want to preserve row names, use
name_rows
to convert them into an explicit column in your
data frame, perform the plyr operations, and then use name_rows
again to convert the column back into row names.
Plyr also provides a set of helper functions for common data analysis problems:
arrange
: re-order the rows of a data frame by
specifying the columns to order by
mutate
: add new columns or modifying existing columns,
like transform
, but new columns can refer to other columns
that you just created.
summarise
: like mutate
but create a
new data frame, not preserving any columns in the old data frame.
join
: an adapation of merge
which is
more similar to SQL, and has a much faster implementation if you only
want to find the first match.
match_df
: a version of join
that instead
of returning the two tables combined together, only returns the rows
in the first table that match the second.
colwise
: make any function work colwise on a dataframe
rename
: easily rename columns in a data frame
round_any
: round a number to any degree of precision
count
: quickly count unique combinations and return
return as a data frame.
The plyr functions are named according to what sort of data structure they split up and what sort of data structure they return:
array
list
data.frame
multiple inputs
repeat multiple times
nothing
So ddply
takes a data frame as input and returns a data frame
as output, and l_ply
takes a list as input and returns nothing
as output.