hpc: Parallelized ‘lapply’

Description

Parallelize applying a function over a list or vector according to the registered parallelization engine.

Usage

tm_parLapply(X, FUN, ...)
tm_parLapply_engine(new)

Value

A list the length of X, with the result of applying FUN

together with the ... arguments to each element of X.

Arguments

X: A vector (atomic or list), or other objects suitable for the engine in use.
FUN: the function to be applied to each element of X.
...: optional arguments to FUN.
new: an object inheriting from class cluster as created by makeCluster() from package parallel, or a function with formals X, FUN and ..., or NULL corresponding to the default of using no parallelization engine.

Details

Parallelization can be employed to speed up some of the embarrassingly parallel computations performed in package tm, specifically tm_index(), tm_map() on a non-lazy-mapped VCorpus, and TermDocumentMatrix() on a VCorpus or PCorpus.

Functions tm_parLapply() and tm_parLapply_engine() can be used to customize parallelization according to the available resources.

tm_parLapply_engine() is used for getting (with no arguments) or setting (with argument new) the parallelization engine employed (see below for examples).

If an engine is set to an object inheriting from class cluster, tm_parLapply() calls parLapply() with this cluster and the given arguments. If set to a function, tm_parLapply() calls the function with the given arguments. Otherwise, it simply calls lapply().

Hence, parallelization via parLapply() and a default cluster registered via setDefaultCluster() can be achieved via

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::parLapply(NULL, X, FUN, ...))

or re-registering the cluster, say cl, using

  tm_parLapply_engine(cl)

(note that since R version 3.5.0, one can use getDefaultCluster() to get the registered default cluster). Using

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::parLapplyLB(NULL, X, FUN, ...))

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::parLapplyLB(cl, X, FUN, ...))

gives load-balancing parallelization with the registered default or given cluster, respectively. To achieve parallelization via forking (on Unix-alike platforms), one can use the above with clusters created by makeForkCluster(), or use

  tm_parLapply_engine(parallel::mclapply)

  tm_parLapply_engine(function(X, FUN, ...)
      parallel::mclapply(X, FUN, ..., mc.cores = n))

to use mclapply() with the default or given number n of cores.

Description

Usage

Value

Arguments

Details

See Also