sdf_bind_rows()
and sdf_bind_cols()
are implementation of the common pattern of
do.call(rbind, sdfs)
or do.call(cbind, sdfs)
for binding many
Spark DataFrames into one.
sdf_bind_rows(..., id = NULL)sdf_bind_cols(...)
sdf_bind_rows()
and sdf_bind_cols()
return tbl_spark
Spark tbls to combine.
Each argument can either be a Spark DataFrame or a list of Spark DataFrames
When row-binding, columns are matched by name, and any missing columns with be filled with NA.
When column-binding, rows are matched by position, so all data frames must have the same number of rows.
Data frame identifier.
When id
is supplied, a new column of identifiers is
created to link each row to its original Spark DataFrame. The labels
are taken from the named arguments to sdf_bind_rows()
. When a
list of Spark DataFrames is supplied, the labels are taken from the
names of the list. If no names are found a numeric sequence is
used instead.
The output of sdf_bind_rows()
will contain a column if that column
appears in any of the inputs.