Learn R Programming

simFrame (version 0.1.2)

clusterRunSimulation: Run a simulation experiment on a snow cluster

Description

Generic function for running a simulation experiment on a snow cluster.

Usage

clusterRunSimulation(cl, x, setup, nrep, control, 
                     contControl = NULL, NAControl = NULL, 
                     design = character(), fun, ..., 
                     SAE = FALSE)

Arguments

cl
a snow cluster.
x
a data.frame (for design-based simulation or simulation based on real data) or a control object for data generation inheriting from "VirtualDataControl" (for model-based simulation).
setup
an object of class "SampleSetup", containing previously set up samples, or a control class for setting up samples inheriting from "VirtualSampleControl".
nrep
a non-negative integer giving the number of repetitions of the simulation experiment (for model-based simulation or simulation based on real data).
control
a control object of class "SimControl"
contControl
an object of a class inheriting from "VirtualContControl", controlling contamination in the simulation experiment.
NAControl
an object of a class inheriting from "VirtualNAControl", controlling the insertion of missing values in the simulation experiment.
design
a character vector specifying the variables (columns) to be used for splitting the data into domains. The simulations, including contamination and the insertion of missing values (unless SAE=TRUE), are then performed on every
fun
a function to be applied in each simulation run.
...
for runSimulation, additional arguments to be passed to fun. For runSim, arguments to be passed to runSimulation.
SAE
a logical indicating whether small area estimation will be used in the simulation.

Value

  • An object of class "SimResults".

Details

Statistical simulation is embarrassingly parallel, hence computational performance can be increased by parallel computing. In simFrame, parallel computing is implemented using the package snow. Note that all objects and packages required for the computations (including simFrame) need to be made available on every worker process. In order to prevent problems with random numbers and to ensure reproducibility, random number streams should be used. In R, the packages rlecuyer and rsprng are available for creating random number streams, which are supported by snow via the function clusterSetupRNG. There are some requirements for slot fun of the control object control. The function must return a numeric vector or an object of class "SimResult", which consists of a slot values (a numeric vector) and a slot add (additional results of any class, e.g., statistical models). Note that the latter is computationally more expensive. Returning a list with components values and add is also accepted and slightly faster than using a "SimResult" object. A data.frame is passed to fun in every simulation run. The corresponding argument must be called x. If comparisons with the original data need to be made, e.g., for evaluating the quality of imputation methods, the function should have an argument called orig. If different domains are used in the simulation, the indices of the current domain can be passed to the function via an argument called domain. For small area estimation, the following points have to be kept in mind. The slot design of control for splitting the data must be supplied and the slot SAE must be set to TRUE. However, the data are not actually split into the specified domains. Instead, the whole data set (sample) is passed to fun. Also contamination and missing values are added to the whole data (sample). Last, but not least, the function must have a domain argument so that the current domain can be extracted from the whole data (sample). In every simulation run, fun is evaluated using try. Hence no results are lost if computations fail in any of the simulation runs.

References

L'Ecuyer, P., Simard, R., Chen E and Kelton, W. (2002) An object-oriented random-number package with many long streams and substreams. Operations Research, 50(6), 1073--1075.

Mascagni, M. and Srinivasan, A. (2000) Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Transactions on Mathematical Software, 26(3), 436--461.

Rossini, A., Tierney L. and Li, N. (2007) Simple parallel statistical computing in R. Journal of Computational and Graphical Statistics, 16(2), 399--420.

Tierney, L., Rossini, A. and Li, N. (2009) snow: A parallel computing framework for the Rsystem. International Journal of Parallel Programming, 37(1), 78--90.

See Also

makeCluster, clusterSetupRNG, runSimulation, SimControl, SimResults, simBwplot, simDensityplot, simXyplot

Examples

Run this code
# these examples require at least dual core processor

# start snow cluster
cl <- makeCluster(2, type = "SOCK")

# load package on workers
clusterEvalQ(cl, library(simFrame))

# setup random number stream
clusterSetupRNG(cl, seed = "1234")

# function for generating data
grnorm <- function(n, means) {
    group <- sample(1:2, n, replace=TRUE)
    data.frame(group=group, value=rnorm(n) + means[group])
}

# control objects for data generation and contamination
means <- c(0, 0.5)
dc <- DataControl(size = 500, distribution = grnorm, 
    dots = list(means = means))
cc <- DCARContControl(target = "value", 
    epsilon = 0.1, dots = list(mean = 10))

# function for simulation runs
sim <- function(x) {
    c(mean = mean(x$value), 
        trimmed = mean(x$value, trim = 0.1), 
        median = median(x$value))
}

# export objects to workers
clusterExport(cl, c("grnorm", "means", "dc", "cc", "sim"))

# run simulation
results <- clusterRunSimulation(cl, dc, nrep = 100, 
    contControl = cc, design = "group", fun = sim)

# plot results
plot(results, true = means)

Run the code above in your browser using DataLab