parDosa: Parallel wrapper function to call from within a function

Description

parDosa is a wrapper function around many functionalities of the parallel package. It is designed to work closely with MCMC fitting functions, e.g. can easily be called from inside of a function.

Usage

parDosa(cl, seq, fun, cldata,
    lib = NULL, dir = NULL, evalq=NULL,
    size = 1, balancing = c("none", "load", "size", "both"),
    rng.type = c("none", "RNGstream"),
    cleanup = TRUE, unload = FALSE, iseed=NULL, ...)

Value

Usually a list with results returned by the cluster.

Arguments

cl: A cluster object created by makeCluster, or an integer. It can also be NULL, see Details.
seq: A vector to split.
fun: A function or character string naming a function.
cldata: A list containing data. This list is then exported to the cluster by clusterExport. It is stored in a hidden environment. Data in cldata can be used by fun.
lib: Character, name of package(s). Optionally packages can be loaded onto the cluster. More than one package can be specified as character vector. Packages already loaded are skipped.
dir: Working directory to use, if NULL working directory is not set on workers (default). Can be a vector to set different directories on workers.
evalq: Character, expressions to evaluate, e.g. for changing global options (passed to clusterEvalQ). More than one expressions can be specified as character vector.
balancing: Character, type of balancing to perform (see Details).
size: Vector of problem sizes (or relative performance information) corresponding to elements of seq (recycled if needed). The default 1 indicates equality of problem sizes.
rng.type: Character, "none" will not set any seeds on the workers, "RNGstream" selects the "L'Ecuyer-CMRG" RNG and then distributes streams to the members of a cluster, optionally setting the seed of the streams by set.seed(iseed) (otherwise they are set from the current seed of the master process: after selecting the L'Ecuyer generator). See clusterSetRNGStream. The logical value !(rng.type == "none") is used for forking (e.g. when cl is integer).
cleanup: logical, if cldata should be removed from the workers after applying fun. If TRUE, effects of dir argument is also cleaned up.
unload: logical, if pkg should be unloaded after applying fun.
iseed: integer or NULL, passed to clusterSetRNGStream to be supplied to set.seed on the workers, or NULL not to set reproducible seeds.
...: Other arguments of fun, that are simple values and not objects. (Arguments passed as objects should be specified in cldata, otherwise those are not exported to the cluster by this function.)

Author

Peter Solymos, solymos@ualberta.ca

Details

The function uses 'snow' type clusters when cl is a cluster object. The function uses 'multicore' type forking (shared memory) when cl is an integer. The value from getOption("mc.cores") is used if the argument is NULL.

The function sets the random seeds, loads packages lib onto the cluster, sets the working directory as dir, exports cldata and evaluates fun on seq.

No balancing (balancing = "none") means, that the problem is split into roughly equal subsets, without respect to size (see clusterSplit). This splitting is deterministic (reproducible).

Load balancing (balancing = "load") means, that the problem is not splitted into subsets a priori, but subsequent items are placed on the worker which is empty (see clusterApplyLB for load balancing). This splitting is non-deterministic (might not be reproducible).

Size balancing (balancing = "size") means, that the problem is splitted into subsets, with respect to size (see clusterSplitSB and parLapplySB). In size balancing, the problem is re-ordered from largest to smallest, and then subsets are determined by minimizing the total approximate processing time. This splitting is deterministic (reproducible).