DesignSurvey: Survey design

Description

A wraper for svydesign function from the survey package, to define one of the following survey designs: two-stage cluster, simple (systematic) or stratified. In the first case, weights are calculated considering a sample with probability proportional to size and with replacement for the first stage and a simple random sampling for the second stage. Finite population correction is specified as the population size for each level of sampling.

Usage

DesignSurvey(sample = NULL, psu.ssu = NULL, psu.col = NULL,
  ssu.col = NULL, cal.col = NULL, N = NULL, strata = NULL,
  cal.N = NULL, ...)

Arguments

sample

data.frame with sample observations. for two-stage cluster designs, one of the columns must contain unique identifiers for PSU and another column must contain unique identifiers for Secondary Sampling Units (SSU).

psu.ssu

data.frame with all Primary Sampling Units (PSU). First column contains PSU unique identifiers. Second column contains numeric PSU sizes. It is used only for two-stage cluster designs.

psu.col

the column of sample containing the psu identifiers (for two-stage cluster designs). It is used only for two-stage cluster designs.

ssu.col

the column of sample containing the ssu identifiers (for two-stage cluster designs). It is used only for two-stage cluster designs.

cal.col

the column of sample with the variable to calibrate estimates. It must be used together with cal.N.

for simple designs, a numeric value representing the total of sampling units in the population. for a stratified design, it is a column of sample indicating, for each observation, the total of sampling units in its respective strata. N is ignored in two-stage cluster designs.

strata

for stratified designs, a column of sample indicating the strata memebership of each observation.

cal.N

population total for the variable to calibrate the estimates. It must be used togheter with cal.col.

...

further arguments passed to svydesign function.

Value

An object of class survey.design.

Details

For two-stage cluster designs, a PSU appearing in both psu.ssu and in sample must have the same identifier. SSU identifiers must be unique but can appear more than once if there is more than one observation per SSU. sample argument must have just the varibles to be estimated plus the variables required to define the design (two-stage cluster or stratified). cal.col and cal.N are needed only if estimates will be calibrated. The calibration is based on a population total.

References

Lumley, T. (2011). Complex surveys: A guide to analysis using R (Vol. 565). Wiley.

Baquero, O. S., Marconcin, S., Rocha, A., & Garcia, R. D. C. M. (2018). Companion animal demography and population management in Pinhais, Brazil. Preventive Veterinary Medicine.

http://oswaldosantos.github.io/capm

Examples

Run this code

# NOT RUN {
data("cluster_sample")
data("psu_ssu")

## Calibrated two-stage cluster design
design <- DesignSurvey(na.omit(cluster_sample),
                       psu.ssu = psu_ssu,
                       psu.col = "census_tract_id",
                       ssu.col = "interview_id",
                       cal.col = "number_of_persons",
                       cal.N = 129445)

## Simple design
# If data in cluster_sample were from a simple design:
design <- DesignSurvey(na.omit(cluster_sample), 
                       N = sum(psu_ssu$hh),
                       cal.N = 129445)

## Stratified design
# Simulate strata and assume that the data in cluster_design came
# from a stratified design
cluster_sample$strat <- sample(c("urban", "rural"),
                               nrow(cluster_sample),
                               prob = c(.95, .05),
                               replace = TRUE)
cluster_sample$strat_size <- round(sum(psu_ssu$hh) * .95)
cluster_sample$strat_size[cluster_sample$strat == "rural"] <-
  round(sum(psu_ssu$hh) * .05)
design <- DesignSurvey(cluster_sample,
                       N = "strat_size",
                       strata = "strat",
                       cal.N = 129445)

# }

Run the code above in your browser using DataLab