Learn R Programming

ChemmineR (version 2.24.2)

parBatchByIndex: Parallel Batch By Index


Takes an index set, breaks it into batches and runs the given function on each batch in parallel using the given cluster. See batchByIndex for the non-parallel version.

When doing a select were the condition is a large number of ids it is not always possible to include them in a single SQL statement. This function will break the list of ids into chunks and allow the indexProcessor to deal with just a small number of ids.


parBatchByIndex(allIndices, indexProcessor, reduce, cl, batchSize = 1e+05)


A vector of values that will be broken into batches and passed as an argument to the indexProcessor function.
A function that takes one batch if indices. It is called once for each batch, possibly in parallel. The return value of this function is collected into a list and passed to the reduce function after all jobs have finished.
This function is run after all jobs have finished. It is called with a list of return values from the indexProcessor function runs. The order of batchs is maintained. The return value of the reduce function is then returned.

The idea is that this function merges all the results together into one result.

A SNOW cluster to run jobs on.
The size of each batch. The last batch may be smaller than this value.


The return value of the reduce function is returned.

See Also



Run this code
	## Not run: 
# 		cl = makeCluster(2) # create a SNOW cluster
# 		#function to run a query for each batch of indexes
# 		job = function(indexBatch)
# 				dbGetQuery(dbConnection, paste("SELECT weight FROM table WHERE id IN (",paste(indexBatch,collapse=","),")"))
# 		# function to combine all the results, in this case by summing them up
# 		reduce = function(results) sum(unlist(results))
# 		indices = 1:10000
# 		#run queries in parallel and then sum the results
# 		totalWeight = parBatchByIndex(indices,job,reduce,cl, 1000)
# 	## End(Not run)

Run the code above in your browser using DataLab