snowfall-calculation: Parallel calculation functions

Description

Parallel calculation functions. Execution is distributed automatically over the cluster.
Most of this functions are wrappers for snow functions, but all can be used directly in sequential mode.

Usage

sfClusterApply( x, fun, ... )
sfClusterApplyLB( x, fun, ... )
sfClusterApplySR( x, fun, ..., name="default", perUpdate=NULL, restore=sfRestore() )
sfClusterMap( fun, ..., MoreArgs = NULL, RECYCLE = TRUE )
sfLapply( x, fun, ... )
sfSapply( x, fun, ..., simplify = TRUE, USE.NAMES = TRUE )
sfApply( x, margin, fun, ... )
sfRapply( x, fun, ... )
sfCapply( x, fun, ... )
sfMM( a, b )
sfRestore()

Arguments

x: vary depending on function. See function details below.
fun: function to call
margin: vector speficying the dimension to use
...: additional arguments to pass to standard function
simplify: logical; see sapply
USE.NAMES: logical; see sapply
a: matrix
b: matrix
RECYCLE: see snow documentation
MoreArgs: see snow documentation
name: a character string indicating the name of this parallel execution. Naming is only needed if there are more than one call to sfClusterApplySR in a program.
perUpdate: a numerical value indicating the progress printing. Values range from 1 to 100 (no printing). Value means: any X percent of progress status is printed. Default (on given value ‘NULL’) is 5).
restore: logical indicating whether results from previous runs should be restored or not. Default is coming from sfCluster. If running without sfCluster, default is FALSE, if yes, it is set to the value coming from the external program.

Details

sfClusterApply calls each index of a given list on a seperate node, so length of given list must be smaller than nodes. Wrapper for snow function clusterApply.

sfClusterApplyLB is a load balanced version of sfClusterApply. If a node finished it's list segment it immidiately starts with the next segment. Use this function in infrastructures with machines with different speed. Wrapper for snow function clusterApplyLB.

sfClusterApplySR saves intermediate results and is able to restore them on a restart. Use this function on very long calculations or it is (however) foreseeable that cluster will not be able to finish it's calculations (e.g. because of a shutdown of a node machine). If your program use more than one parallised part, argument name must be given with a unique name for each loop. Intermediate data is saved depending on R-filename, so restore of data must be explicit given for not confusing changes on your R-file (it is recommended to only restore on fully tested programs). If restores, sfClusterApplySR continues calculation after the first non-null value in the saved list. If your parallized function can return null values, you probably want to change this.

sfLapply, sfSapply and sfApply are parallel versions of lapply, sapply and apply. The first two use an list or vector as argument, the latter an array.

parMM is a parallel matrix multiplication. Wrapper for snow function parMM.

sfRapply and sfCapply are not implemented atm.

Examples

Run this code

if (FALSE) {
  restoreResults <- TRUE

  sfInit(parallel=FALSE)

  ## Execute in cluster or sequential.
  sfLapply(1:10, exp)

  ## Execute with intermediate result saving and restore on wish.
  sfClusterApplySR(1:100, exp, name="CALC_EXP", restore=restoreResults)
  sfClusterApplySR(1:100, sum, name="CALC_SUM", restore=restoreResults)

  sfStop()

  ##
  ## Small bootstrap example.
  ##
  sfInit(parallel=TRUE, cpus=2)

  require(mvna)
  data(sir.adm)

  sfExport("sir.adm", local=FALSE)
  sfLibrary(cmprsk)

  wrapper <- function(a) {
    index <- sample(1:nrow(sir.adm), replace=TRUE)
    temp <- sir.adm[index, ]
    fit <- crr(temp$time, temp$status, temp$pneu, failcode=1, cencode=0)
    return(fit$coef)
  }

  result <- sfLapply(1:100, wrapper)

  mean( unlist( rbind( result ) ) )
  sfStop()
}

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples