With Crunch, you can add additional rows to a dataset by appending a second dataset to the bottom of the original dataset. Crunch makes intelligent guesses to align the variables between the two datasets and to harmonize the categories and subvariables of variables, as appropriate.
appendDataset(dataset1, dataset2, upsert = FALSE)
dataset1
, updated with dataset2
, potentially filtered on rows and
variables, appended to it.
a CrunchDataset
another CrunchDataset, or possibly a data.frame. If
dataset2
is not a Crunch dataset, it will be uploaded as a new
dataset before appending. If it is a CrunchDataset, it may be subsetted with
a filter expression on the rows and a selection of variables on the columns.
Logical: should the append instead "update" rows based on the
primary key variable and "insert" (append) where the primary key values are
new? Default is FALSE
. Note that this upserting behavior requires a primary
key variable to have been set previously; see pk()
.
Variables are matched between datasets based on their aliases. Variables
present in only one of the two datasets are fine; they're handled by filling
in with missing values for the rows corresponding to the dataset where they
don't exist. For variables present in both datasets, you will have best
results if you ensure that the two datasets have the same variable names
and types, and that their categorical and array variables have consistent
categories. To preview how datasets will align when appended, see
compareDatasets()
.
Particularly if you're appending to datasets that are already shared with
others, you may want to use the fork-edit-merge workflow when appending
datasets. This allows you to verify your changes before releasing them to
the other viewers of the dataset. To do this fork the dataset with
forkDataset()
, append the new data to the fork, ensure that the append
worked as expected, and then merge the fork back to the original dataset
with mergeFork()
. For more, see vignette("fork-and-merge", package = "crunch")
.
if (FALSE) {
ds <- loadDataset("Survey, 2016")
new_wave <- loadDataset("Survey, 2017")
ds <- appendDataset(ds, new_wave)
}
Run the code above in your browser using DataLab