A list with an entry for each block. Each list entry contains two vectors --- one with the indices indicating the block members in dataset A,
and another containing the indices indicating the block members in dataset B.
Arguments
dfA
Dataset A - to be matched to Dataset B
dfB
Dataset B - to be matched to Dataset A
varnames
A vector of variable names to use for blocking.
Must be present in both dfA and dfB
window.block
A vector of variable names indicating that the variable should be
blocked using windowing blocking. Must be present in varnames.
window.size
The size of the window for window blocking. Default is 1
(observations +/- 1 on the specified variable will be blocked together).
kmeans.block
A vector of variable names indicating that the variable should be
blocked using k-means blocking. Must be present in varnames.
nclusters
Number of clusters to create with k-means. Default value is the
number of clusters where the average cluster size is 100,000 observations.
iter.max
Maximum number of iterations for the k-means algorithm to run. Default is 5000
n.cores
Number of cores to parallelize over. Default is NULL.