Learn R Programming

pbdDEMO (version 0.3-1)

read.csv.ddmatrix: A Simple Parallel CSV Reader

Description

Read in a table from a CSV file in parallel as a distributed matrix.

Usage

read.csv.ddmatrix(file, sep = ",", nrows, ncols, header = FALSE, bldim = 4, num.rdrs = 1, ICTXT = 0, exact.linecount = TRUE)

Arguments

file
csv file name.
sep
separator character.
nrows, ncols
dimensions of the csv file. Allowed to be missing in function call.
header
logical indicating presence/absence of character header for file.
bldim
the blocking dimension for block-cyclically distributing the matrix across the process grid
num.rdrs
numer of processes to be used to read in the table
ICTXT
BLACS context number for return
exact.linecount
linecount In the event that nrows is missing, this determines whether or not the exact number of rows should be determined (which requires a file read), or if an estimate should be used. Default is TRUE, meaning that the file will be scanned.

Value

Returns a distributed matrix.

Details

The function reads in data from a csv file into a distributed matrix. This function sits somewhere between scan() and read.csv(), but for parallel reads into a distributed matrix.

The arguments nrow= and ncol= are optional. In the case that they are left blank, they will be determined. However, note that doing so is costly, so knowing the dimensions beforehand can greatly improve performance.

Although frankly, the performance-minded should not be using csv's in the first place. Consider using the pbdNCDF4 package for managing data.