Learn R Programming

ddR (version 0.1.2)

dmapply: Distributed version of mapply. Similar to R's 'mapply', it allows a multivariate function, FUN, to be applied to several inputs. Unlike standard mapply, it always returns a distributed object.

Description

Though dmapply is modeled after mapply, there are several important differences, as evident in the parameters described below.

Usage

dmapply(FUN, ..., MoreArgs = list(), output.type = c("dlist", "dframe", "darray", "sparse_darray"), nparts = NULL, combine = c("default", "c", "rbind", "cbind"))

Arguments

FUN
function to apply, found via 'match.fun'.
...
arguments to vectorize over (vectors or lists of strictly positive length, or all of zero length). These may also be distributed objects, such as dlists, darrays, and dframes.
MoreArgs
a list of other arguments to 'FUN'.
output.type
the output type of the distributed object. The default value of "dlist" means that the result of dmapply will be stored in a distributed list. "darray" will make dmapply return a darray, just as "dframe" will make it return a dframe. "sparse_darray" results in a special version of darray where the elements are sparse.
nparts
a 1d or 2d numeric value to specify how the output should be partitioned. dlists only have one-dimensional partitioning, whereas darrays and dframes have two (representing the number partitions across the vertical and horizontal dimensions).
combine
for dframes and darrays, it specifies how the results of dmapply are combined within each partition (if each partition contains more than one result). If "rbind", the results are stitched using rbind; if "cbind", cbind is used. If the value is "c", the results are flattened into one column, as is the case with simplify2array(). For dlists, "c" will first attempt to unlist each element of the dmapply result and then expand these items within the partition of the dlist. One may think of this as the function that is invoked on the resulting list after the dmapply, with 'do.call'. The default value is "default", which for darrays and dframes has identical behavior to "c". For dlists, no function is called if "default".

Value

A dlist, darray, or dframe (depending on the value of output.type), with number of partitions equal to nparts

References

Prasad, S., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I. Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. _Sigmod 2015_, 1657-1668.

Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., and Schreiber, R. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. _EuroSys 2013_, 197-210.

Homepage: https://github.com/vertica/ddR

Examples

Run this code
## Not run: 
# ## A dlist created by adding two input vectors
# a <- dmapply(function(x,y) x+y, 1:5, 2:6, nparts=3)
# collect(a)
# 
# ##Create a darray with 4 partitions. Partitions are stitched in 2x2 fashion,
# # meaning the overall dims of the darray will be 4x4.
# b <- dmapply(function(x) matrix(x,2,2), 1:4,output.type="darray",combine="rbind",nparts=c(2,2))
# collect(b,1) #First partition
# collect(b)
# ## End(Not run)

Run the code above in your browser using DataLab