dotdot
Installation :
devtools::install_github("moodymudskipper/dotdot")
This package proposes an improved assignment using the shorthand ..
.
library(dotdot)
x <- y <- iris
x$Sepal.Length[5] <- x$Sepal.Length[5] + 3
y$Sepal.Length[5] := .. + 3
identical(x,y)
#> [1] TRUE
z <- factor(letters[1:3])
levels(z) := c(.., "level4")
z
#> [1] a b c
#> Levels: a b c level4
You can think about the ..
as the :
of the :=
symbol laid horizontally.
Integration with data.table
, tidyverse
and other packages using :=
The operator :=
is used by prominent packages data.table
and rlang
(mostly through tidyverse
functions), but they only use it to parse expressions, due to its convenient operator precedence. It's not actually called.
Thus dotdot
is completely tidyverse
and data.table
compatible, and some adhoc adjustments were made so it even works when the latter are attached after dotdot
.
library(data.table)
#>
#> Attaching package: 'data.table'
#> The following object is masked _by_ 'package:dotdot':
#>
#> :=
levels(z) := c(.., "level5")
z
#> [1] a b c
#> Levels: a b c level4 level5
data <- as.data.table(head(iris,2))
data[,new_col := 3] # `:= ` works as if dotdot wasn't attached
data
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species new_col
#> 1: 5.1 3.5 1.4 0.2 setosa 3
#> 2: 4.9 3.0 1.4 0.2 setosa 3
An example of fine integration of the operator being used by dotdot
and rlang
through dplyr
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#>
#> between, first, last
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_data_frame <- iris[3:5]
var = quo(Log.Petal.Width)
my_data_frame := .. %>% mutate(!!var := log(Petal.Width)) %>% head(2)
my_data_frame
#> Petal.Length Petal.Width Species Log.Petal.Width
#> 1 1.4 0.2 setosa -1.609438
#> 2 1.4 0.2 setosa -1.609438
In case you've attached another package containing :=
, you can use dotdot_first()
to make sure that our :=
is not masked (It seems to be rare enough though as I couldn't find an example)
Comparison with magrittr's %<>%
The package magrittr
contains the operator %<>%
which serves a similar role to :=
. Let's see how it is similar first, and then how it differs :
These calls have the same effect:
iris$Sepal.Length %<>% log
iris$Sepal.Length %<>% log(.)
iris$Sepal.Length := log(..)
Those as well, but here we see magrittr
is less compact and readable.
iris$Sepal.Length[5] %<>% multiply_by(2) %>% add(3)
iris$Sepal.Length[5] %<>% {2*. + 3}
iris$Sepal.Length[5] := 2*.. + 3
Now for the differences, aside from compacity and readability :
- attaching
magrittr
means often masking functions likesextract
orset_names
.dotdot
only exports its operator and thedotdot_first
function. magrittr
operators deal with environment in a way that is much less straightforward, so this won't work :
library(magrittr)
test <- function(some_parameter) {
some_parameter %<>% {as.character(substitute(.))}
some_parameter
}
x <- try(test(foo))
#> Error in eval(lhs, parent, parent) : objet 'foo' introuvable
inherits(x,"try-error")
#> [1] TRUE
While this will work fine:
test <- function(some_parameter) {
some_parameter := as.character(substitute(..))
some_parameter
}
test(foo)
#> [1] "foo"
:=
is also faster than%<>%
, though these operations are fast anyway and not likely to be a bottleneck very often if ever:
b <- x <- y <- z <- 1
microbenchmark::microbenchmark(
base = {b <- b + 1},
dotdot = {x := .. + 1},
magrittr = {y %<>% add(1)},
magrittr2 = {z %<>% {. + 1}},
times = 1e4
)
#> Unit: nanoseconds
#> expr min lq mean median uq max neval cld
#> base 200 302 493.4608 401 501 36501 10000 a
#> dotdot 10001 11802 14891.2715 12901 14001 2888200 10000 b
#> magrittr 61001 63502 79046.3541 65301 69700 4641102 10000 d
#> magrittr2 45601 47801 58116.3759 49100 51701 3128902 10000 c
Edge cases and good practice
:=
is NOT meant to be a complete replacement of the <-
operator, the latter is explicit in the absence of ..
, so more readable, is faster (though we're speaking microseconds), and won't clutter your traceback()
when debugging.
:=
can be used several times in a statement like z <- (x := .. + 1) + (y:= .. +1)
but it never makes sense to use it :=
several times in an assignment such as x := (y := .. + 2)
as all the ..
will be replaced by the name of the variable on the lhs of the first evaluated :=
in any case. It can even produce counter intuitive output, see below.
This is all good and explicit :
x <- 4
y <- 7
z <- (x := .. + 1) + (y:= .. +1)
x
y
z
But using several nested :=
is unuseful and potentially confusing, here the dots will be replaced by x
, though one might have expected them to be replaced by y
.
x <- 4
y <- 7
x := (y := .. + 2) # same as `x <- (y := x + 2)`
x
y
Good practice makes things unambiguous :
x <- 4
y <- 7
x <- (y := .. + 2)
x
y
x <- 4
y <- 7
x := (y <- .. + 2)
x
y