
Fast merge of two data.table
s. The data.table
method behaves
similarly to data.frame
except that row order is specified, and by
default the columns to merge on are chosen:
at first based on the shared key columns, and if there are none,
then based on key columns of the first argument x
, and if there
are none,
then based on the common columns between the two data.table
s.
Use the by
, by.x
and by.y
arguments explicitly to override this default.
# S3 method for data.table
merge(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FALSE,
all.x = all, all.y = all, sort = TRUE, suffixes = c(".x", ".y"), no.dups = TRUE,
allow.cartesian=getOption("datatable.allow.cartesian"), # default FALSE
incomparables = NULL, ...)
A new data.table
based on the merged data table
s, and sorted by the
columns set (or inferred for) the by
argument if argument sort
is
set to TRUE
.
data table
s. y
is coerced to a data.table
if
it isn't one already.
A vector of shared column names in x
and y
to merge on.
This defaults to the shared key columns between the two tables.
If y
has no key columns, this defaults to the key of x
.
Vectors of column names in x
and y
to merge on.
logical; all = TRUE
is shorthand to save setting both
all.x = TRUE
and all.y = TRUE
.
logical; if TRUE
, rows from x
which have no matching row
in y
are included. These rows will have 'NA's in the columns that are usually
filled with values from y
. The default is FALSE
so that only rows with
data from both x
and y
are included in the output.
logical; analogous to all.x
above.
logical. If TRUE
(default), the rows of the merged
data.table
are sorted by setting the key to the by / by.x
columns. If
FALSE
, unlike base R's merge
for which row order is unspecified, the
row order in x
is retained (including retaining the position of missing entries when
all.x=TRUE
), followed by y
rows that don't match x
(when all.y=TRUE
)
retaining the order those appear in y
.
A character(2)
specifying the suffixes to be used for
making non-by
column names unique. The suffix behaviour works in a similar
fashion as the merge.data.frame
method does.
logical indicating that suffixes
are also appended to
non-by.y
column names in y
when they have the same column name
as any by.x
.
See allow.cartesian
in [.data.table
.
values which cannot be matched and therefore are excluded from by columns.
Not used at this time.
merge
is a generic function in base R. It dispatches to either the
merge.data.frame
method or merge.data.table
method depending on
the class of its first argument. Note that, unlike SQL
join, NA
is
matched against NA
(and NaN
against NaN
) while merging.
For a more data.table
-centric way of merging two data.table
s, see
[.data.table
; e.g., x[y, ...]
. See FAQ 1.11 for a detailed
comparison of merge
and x[y, ...]
.
data.table
, setkey
, [.data.table
,
merge.data.frame
(dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A"))
(dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A"))
merge(dt1, dt2)
merge(dt1, dt2, all = TRUE)
(dt1 <- data.table(A = letters[rep(1:3, 2)], X = 1:6, key = "A"))
(dt2 <- data.table(A = letters[rep(2:4, 2)], Y = 6:1, key = "A"))
merge(dt1, dt2, allow.cartesian=TRUE)
(dt1 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(1:3, 2)], X = 1:6, key = c("A", "B")))
(dt2 <- data.table(A = c(rep(1L, 5), 2L), B = letters[rep(2:4, 2)], Y = 6:1, key = c("A", "B")))
merge(dt1, dt2)
merge(dt1, dt2, by="B", allow.cartesian=TRUE)
# test it more:
d1 <- data.table(a=rep(1:2,each=3), b=1:6, key=c("a", "b"))
d2 <- data.table(a=0:1, bb=10:11, key="a")
d3 <- data.table(a=0:1, key="a")
d4 <- data.table(a=0:1, b=0:1, key=c("a", "b"))
merge(d1, d2)
merge(d2, d1)
merge(d1, d2, all=TRUE)
merge(d2, d1, all=TRUE)
merge(d3, d1)
merge(d1, d3)
merge(d1, d3, all=TRUE)
merge(d3, d1, all=TRUE)
merge(d1, d4)
merge(d1, d4, by="a", suffixes=c(".d1", ".d4"))
merge(d4, d1)
merge(d1, d4, all=TRUE)
merge(d4, d1, all=TRUE)
# setkey is automatic by default
set.seed(1L)
d1 <- data.table(a=sample(rep(1:3,each=2)), z=1:6)
d2 <- data.table(a=2:0, z=10:12)
merge(d1, d2, by="a")
merge(d1, d2, by="a", all=TRUE)
# using by.x and by.y
setnames(d2, "a", "b")
merge(d1, d2, by.x="a", by.y="b")
merge(d1, d2, by.x="a", by.y="b", all=TRUE)
merge(d2, d1, by.x="b", by.y="a")
# using incomparables values
d1 <- data.table(a=c(1,2,NA,NA,3,1), z=1:6)
d2 <- data.table(a=c(1,2,NA), z=10:12)
merge(d1, d2, by="a")
merge(d1, d2, by="a", incomparables=NA)
Run the code above in your browser using DataLab