chain
provides a different way to write computations that
pass a value through a chain of transformations.
mkchain[...](...)
chain[...](., ...)
mkchain(...)
chain(., ...)
. %|>% func
chain
the first parameter in parentheses is the
data to run through the chain.%|>%
is a shortcut for a chain of one step.mkchain
return the constructed function. For
chain
, apply the chain to the dataset given in the first
argument and return the result.
P
defined by a
M-by-2 array of coordinates and you want to find the total length of
the line segments connecting each point in sequence. My stream of
thought for this goes something like "okay, take the difference
between rows, square, sum along columns, square root, and sum." You
could write:length <- sum(sqrt(rowSums(apply(P, 2, diff)^2)))
However this must be read "inside-out" to follow the computation. I find it easier to follow if written this way:
length <- chain(P, apply(2,diff), .^2, rowSums, sqrt, sum)
which can be read from left to right, noting that the output of each expression becomes the input of the next.
Note that some arguments above are the names of functions, and
others are expressions. chain
applies whichever
interpretation appears most appropriate: bare words are taken to
be functions, expressions containing the placeholder name (by
default .
) evaluate to expressions, and expressions that do
not contain the placeholder have a placeholder injected at the
first argument. Thus apply(2,diff)
is interpreted as
apply(.,2,diff)
, with the .
coming from the output
of the previous step. This tends to work well because of the
typical convention in R of the dataset being the first argument to
any function. The above is equivalent to:
length <- chain(P, apply(.,2,diff), .^2, rowSums(.), sqrt(.), sum(.))
If you want to keep an intermediate value along the chain for use, you can name the arguments, as in
alphabetize <- mkchain(values=., names, order, values[.])
.
You can also use a different placeholder than "."
by
supplying it in brackets, as in chain[x](x^2, mean, sqrt)
.
This is useful for nested invocations of chain
or if
another package has a use for "."
. When used with
mkchain
, you can specify other arguments and
defaults, as in mkchain[., pow=2](.^pow, mean, .^(1/pow))
.
More than the occasional use of temporary names and alternate
placeholder names might indicate chain
is not helping
clarity :)
Note that subassignments, for example chain(letters, names(.)
<- toupper(.))
return the rvalue, which is not usually what you
want (here it will return the upcased characters, not the object
with upcased names.) Instead use put
, as in
chain(letters, put(., names, toupper(.))
, or even better in this
case, chain(letters, inject(names, toupper))
.
# In help("match_df", package="plyr") there is this example:
library(plyr)
data(baseball)
longterm <- subset(count(baseball, "id"), freq > 25)
bb_longterm <- match_df(baseball, longterm, on="id")
bb_longterm[1:5,]
# Rewriting the above using chain:
chain(b=baseball, count("id"), subset(freq>25),
match_df(b, ., on="id"), head(5))
Run the code above in your browser using DataLab