Learn R Programming

simFrame (version 0.5.4)

setup: Set up multiple samples

Description

Generic function for setting up multiple samples.

Usage

setup(x, control, …)

# S4 method for data.frame,SampleControl setup(x, control)

Arguments

x

the data to sample from.

control

a control object inheriting from the virtual class "VirtualSampleControl" or a character string specifying such a control class (the default being "SampleControl").

if control is a character string or missing, the slots of the control object may be supplied as additional arguments. See "'>SampleControl" for details on the slots.

Value

An object of class "SampleSetup".

Methods

x = "data.frame", control = "character"

set up multiple samples using a control class specified by the character string control. The slots of the control object may be supplied as additional arguments.

x = "data.frame", control = "missing"

set up multiple samples using a control object of class "SampleControl". Its slots may be supplied as additional arguments.

x = "data.frame", control = "SampleControl"

set up multiple samples as defined by the control object control.

Details

A fundamental design principle of the framework in the case of design-based simulation studies is that the sampling procedure is separated from the simulation procedure. Two main advantages arise from setting up all samples in advance.

First, the repeated sampling reduces overall computation time dramatically in certain situations, since computer-intensive tasks like stratification need to be performed only once. This is particularly relevant for large population data. In close-to-reality simulation studies carried out in research projects in survey statistics, often up to 10000 samples are drawn from a population of millions of individuals with stratified sampling designs. For such large data sets, stratification takes a considerable amount of time and is a very memory-intensive task. If the samples are taken on-the-fly, i.e., in every simulation run one sample is drawn, the function to take the stratified sample would typically split the population into the different strata in each of the 10000 simulation runs. If all samples are drawn in advance, on the other hand, the population data need to be split only once and all 10000 samples can be taken from the respective strata together.

Second, the samples can be stored permanently, which simplifies the reproduction of simulation results and may help to maximize comparability of results obtained by different partners in a research project. In particular, this is useful for large population data, when complex sampling techniques may be very time-consuming. In research projects involving different partners, usually different groups investigate different kinds of estimators. If the two groups use not only the same population data, but also the same previously set up samples, their results are highly comparable.

The control class "SampleControl" is highly flexible and allows stratified sampling as well as sampling of whole groups rather than individuals with a specified sampling method. Hence it is often sufficient to implement the desired sampling method for the simple non-stratified case to extend the existing framework. See "'>SampleControl" for some restrictions on the argument names of such a function, which should return a vector containing the indices of the sampled observations.

Nevertheless, for very complex sampling procedures, it is possible to define a control class "MySampleControl" extending "'>VirtualSampleControl", and the corresponding method setup(x, control) with signature 'data.frame, MySampleControl'. In order to optimize computational performance, it is necessary to efficiently set up multiple samples. Thereby the slot k of "VirtualSampleControl" needs to be used to control the number of samples, and the resulting object must be of class "'>SampleSetup".

References

Alfons, A., Templ, M. and Filzmoser, P. (2010) An Object-Oriented Framework for Statistical Simulation: The R Package simFrame. Journal of Statistical Software, 37(3), 1--36. 10.18637/jss.v037.i03.

See Also

simSample, draw, "'>SampleControl", "'>TwoStageControl", "'>VirtualSampleControl", "'>SampleSetup"

Examples

Run this code
# NOT RUN {
set.seed(12345)  # for reproducibility
data(eusilcP)    # load data

## simple random sampling
srss <- setup(eusilcP, size = 20, k = 4)
summary(srss)
draw(eusilcP[, c("id", "eqIncome")], srss, i = 1)

## group sampling
gss <- setup(eusilcP, grouping = "hid", size = 10, k = 4)
summary(gss)
draw(eusilcP[, c("hid", "id", "eqIncome")], gss, i = 2)

## stratified simple random sampling
ssrss <- setup(eusilcP, design = "region",
    size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(ssrss)
draw(eusilcP[, c("id", "region", "eqIncome")], ssrss, i = 3)

## stratified group sampling
sgss <- setup(eusilcP, design = "region",
    grouping = "hid", size = c(2, 5, 5, 3, 4, 5, 3, 5, 2), k = 4)
summary(sgss)
draw(eusilcP[, c("hid", "id", "region", "eqIncome")], sgss, i = 4)
# }

Run the code above in your browser using DataLab