Learn R Programming

dwtools (version 0.8.3.9)

joinbyv: Batch join multiple tables

Description

Perform batch join of multiple tables to one master table.

Usage

joinbyv(master, join, by, col.subset, row.subset, nomatch, allow.cartesian)

Arguments

master
data.table, optionally also a single data.table nested in the list.
join
list of data.tables which to join to master data.table.
by
list of character vectors. Default: lapply(join, key). Required when not all of join data.tables has key.
col.subset
list of character vectors. Default: lapply(join, names).
row.subset
list of expressions to be passed to corresponding join data.table i argument. Default: as.list(rep(TRUE,length(join))). To subset result master data.table, use row.subset together with corresponding nomatch argument equal to 0 (inner join). By default when providing row.subset list element the corresponding nomatch argument will be changed to 0 to perform inner join, otherwise it will be getOption("datatable.nomatch"). If you really want to do outer join to already filtered join data.table you need to override corresponding nomatch argument for NA. Cross table expressions and not supported inside joinbyv.
nomatch
list of integer scalars NA or 0 elements corresponding join data.tables. Default: lapply(row.subset, function(x) if(is.language(x)) 0 else getOption("datatable.nomatch")). Indicates outer join for NA and inner join for 0. When data.table extends allowed argument in the getOption("datatable.nomatch") then it should accept not only integer scalar but any value supported by data.table as nomatch argument.
allow.cartesian
list of logical scalar elements corresponding to join data.tables to define which of the joins are allowed to do cartesian product. Default: as.list(rep(getOption("datatable.allow.cartesian"),length(join))).

Value

data.table, denormalized master table joined to defined join data.tables. Column order according to col.subset, no key.

Details

Any NULL inside the lists provided to by, col.subset, row.subset, nomatch, allow.cartesian will be replaced by the default value for particular NULL element. Therefore it is possible to pass partially filled lists, length of each must match to length of join, example col.subset=list(NULL,c("join2_col1","join2_col2"),NULL,c("join4_col1")). Function do not allow cross tables row.subset filtering expressions. User should apply such filter after joinbyv, example: joinbyv(master, join)[join1_colA > join2_colA * 2]. Arguments nomatch, allow.cartesian by default will be setup according to data.table options. An exception case described in row.subset argument will by default override the nomatch argument to value 0 to perform inner join. Possibly the performance might be improved after implementing data.table FR #691 and #692.

See Also

build_hierarchy, dw.populate

Examples

Run this code
suppressPackageStartupMessages(library(dwtools))

X <- dw.populate() # scenario="star"
lapply(X, head)

DT <- joinbyv(master = X$SALES,
              join = list(time = X$TIME, product = X$PRODUCT, currency = X$CURRENCY),
              by = list("time_code","prod_code","curr_code"))
str(DT)

# joinbyv including timing
DT <- timing(
  joinbyv(master = X$SALES,
          join = list(time = X$TIME, product = X$PRODUCT, currency = X$CURRENCY),
          by = list("time_code","prod_code","curr_code"))
)
get.timing()

Run the code above in your browser using DataLab