A join specification created with join_by()
, or a character
vector of variables to join by.
If NULL
, the default, *_join()
will perform a natural join, using all
variables in common across x
and y
. A message lists the variables so
that you can check they're correct; suppress the message by supplying by
explicitly.
To join on different variables between x
and y
, use a join_by()
specification. For example, join_by(a == b)
will match x$a
to y$b
.
To join by multiple variables, use a join_by()
specification with
multiple expressions. For example, join_by(a == b, c == d)
will match
x$a
to y$b
and x$c
to y$d
. If the column names are the same between
x
and y
, you can shorten this by listing only the variable names, like
join_by(a, c)
.
join_by()
can also be used to perform inequality, rolling, and overlap
joins. See the documentation at ?join_by for details on
these types of joins.
For simple equality joins, you can alternatively specify a character vector
of variable names to join by. For example, by = c("a", "b")
joins x$a
to y$a
and x$b
to y$b
. If variable names differ between x
and y
,
use a named character vector like by = c("x_a" = "y_a", "x_b" = "y_b")
.
To perform a cross-join, generating all combinations of x
and y
, see
cross_join()
.