Learn R Programming

ddR (version 0.1.2)

darray: Creates a distributed array with the specified partitioning and contents.

Description

Creates a distributed array with the specified partitioning and contents.

Usage

darray(nparts = NULL, dim = NULL, psize = NULL, data = 0, sparse = FALSE)
DArray(nparts = NULL, dim = NULL, psize = NULL, data = 0, sparse = FALSE)

Arguments

nparts
vector specifying number of partitions. If missing, 'psize' and 'dim' must be provided.
dim
the dim attribute for the array to be created. A vector specifying number of rows and columns.
psize
size of each partition as a vector specifying number of rows and columns. This parameter is provided together with dim.
data
initial value of all elements in array. Default is 0.
sparse
If TRUE, the output darray will be of type sparse_darray. The default value is FALSE.

Value

Returns a distributed array with the specified dimensions. Data may reside as partitions in remote nodes.

Details

Array partitions are internally stored as dense matrices. Last set of partitions may have fewer rows or columns if the array size is not an integer multiple of partition size. For example, the distributed array 'darray(dim=c(5,5), psize=c(2,5))' has three partitions. The first two partitions have two rows each but the last partition has only one row. All three partitions have five columns.

Distributed arrays can also be defined by specifying just the number of partitions, but not their sizes. This flexibility is useful when the size of an array is not known apriori. For example, 'darray(nparts=c(5,1))' is a dense array with five partitions. Each partition can contain any number of rows, though the number of columns should be same to conform to a well formed array.

Distributed arrays can be fetched at the master using collect. Number of partitions can be obtained by nparts. Partitions are numbered from left to right, and then top to bottom, i.e., row major order. Dimension of each partition can be obtained using psize.

References

Prasad, S., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I. Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. _Sigmod 2015_, 1657-1668.

Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., and Schreiber, R. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. _EuroSys 2013_, 197-210.

Homepage: https://github.com/vertica/ddR

See Also

collect psize dmapply

Examples

Run this code
## Not run: 
# ## A 9 partition (each partition 3x3), 9x9 DArray with each element initialized to 5.
# a <- darray(psize=c(3,3),dim=c(9,9),data=5)
# collect(a)
# b <- darray(psize=c(3,3),dim=c(9,9)) # Same as 'a', but filled with 0s.
# ## An empty darray with 6 partitions, 2 per column and 3 per row.
# c <- darray(nparts=c(2,3))
# ## End(Not run)

Run the code above in your browser using DataLab