pforeach - Easy to parallel processing in R

Koji MAKIYAMA

Overview

There is a parallel processing code using the foreach package:

library(doParallel)
cl <- makeCluster(detectCores())
registerDoParallel(cl)
foreach(i = 1:3, .combine=c) %dopar% {
  i**2
}
stopCluster(cl)

## [1] 1 4 9

It can be simplified using pforeach() instead of foreach():

library(pforeach)
pforeach(i = 1:3)({
  i**2
})

## [1] 1 4 9

How to install

install.packages("devtools") # if you have not installed "devtools" package
devtools::install_github("hoxo-m/pforeach")

To simplify!

With other packages

Using foreach():

library(doParallel)
library(dplyr) # With other package
cl <- makeCluster(detectCores())
registerDoParallel(cl)
# You must indicate .packages parameter.
foreach(i = 1:3, .combine=c, .packages="dplyr") %dopar% {
  iris[i, ] %>% select(-Species) %>% sum
}
stopCluster(cl)

## [1] 10.2  9.5  9.4

Using pforeach():

library(pforeach)
library(dplyr) # With other package
# You need not to mind that.
pforeach(i = 1:3)({
  iris[i, ] %>% select(-Species) %>% sum
})

## [1] 10.2  9.5  9.4

With enclosed variables

Using foreach():

# You must indicate .export parameter. 
library(doParallel)
square <- function(x) x**2
execute <- function() {
  cl <- makeCluster(detectCores())
  registerDoParallel(cl)
  result <- foreach(i = 1:3, .combine=c, .export="square") %dopar% {
    square(i)
  }
  stopCluster(cl)
  result
}
execute()

## [1] 1 4 9

Using pforeach():

# Need not to mind!
library(pforeach)
square <- function(x) x**2
execute <- function() {
  pforeach(i = 1:3)({
    square(i)
  })
}
execute()

## [1] 1 4 9

Rows and Columns

Iterations for data frame can simplify using rows() and cols() instead of iterators::iter().

Using iter():

library(doParallel)
cl <- makeCluster(detectCores())
registerDoParallel(cl)
data <- iris[1:5, ]
foreach(row = iter(data, by="row"), .combine=c) %dopar% {
  sum(row[-5])
}
stopCluster(cl)

## [1] 10.2  9.5  9.4  9.4 10.2

Using rows():

library(pforeach)
data <- iris[1:5, ]
pforeach(row = rows(data))({
  sum(row[-5])
})

## [1] 10.2  9.5  9.4  9.4 10.2

Using cols():

library(pforeach)
data <- iris[, 1:4]
pforeach(col = cols(data))({
  mean(col)
})

## [1] 5.843 3.057 3.758 1.199

Options

Cores

You can indicate number of cores for parallel processing with .cores parameter.

pforeach(i = 1:3, .cores = 2)({
  i**2
})

If you set minus value to .cores for example .cores = -1, it means .cores = detectCores() - 1.

pforeach(i = 1:3, .cores = -1)({
  i**2
})

Fix random seed

If you want to fix random seed, set .seed parameter:

library(pforeach)
pforeach(i = 1:3, .seed = 12345)({
  rnorm(1)
})

## [1]  0.74543 -0.06048 -0.40347

Do not parallel

If you want to change a parallel code to the non-parallel code with pforeach, you just only add one character "n".

library(pforeach)
npforeach(i = 1:3)({
  i**2
})

## [1] 1 4 9

Application

Parallelized random forest code with foreach is below:

library(doParallel)
library(randomForest)
library(kernlab)

data(spam)
cores <- detectCores()

cl <- makePSOCKcluster(cores)
registerDoParallel(cl)

fit.rf <- foreach(ntree=rep(250, cores), .combine=combine, 
                  .export="spam", .packages="randomForest") %dopar% {
  randomForest(type ~ ., data = spam, ntree = ntree)
}

stopCluster(cl)

Using pforeach:

library(pforeach)
library(randomForest)
library(kernlab)

data(spam)

fit.rf <- pforeach(ntree=rep(250, .cores), .c=combine)({
  randomForest(type ~ ., data = spam, ntree = ntree)
})

pforeach - Easy to parallel processing in R

Overview

How to install

To simplify!

With other packages

With enclosed variables

Rows and Columns

Options

Cores

Fix random seed

Do not parallel

Application

Copy Link

Version

Version

License

Maintainer

Last Published

Functions in pforeach (1.3)