svrepdesign: Specify survey design with replicate weights

Description

Some recent large-scale surveys specify replication weights rather than the sampling design (partly for privacy reasons). This function specifies the data structure for such a survey.

Usage

svrepdesign(variables , repweights , weights, data, degf=NULL,...)
# S3 method for default
svrepdesign(variables = NULL, repweights = NULL, weights = NULL, 
   data = NULL, degf=NULL, type = c("BRR", "Fay", "JK1","JKn","bootstrap",
   "ACS","successive-difference","JK2","other"),
   combined.weights=TRUE, rho = NULL, bootstrap.average=NULL,
   scale=NULL, rscales=NULL,fpc=NULL, fpctype=c("fraction","correction"),
   mse=getOption("survey.replicates.mse"),...)
# S3 method for imputationList
svrepdesign(variables=NULL,
repweights,weights,data, degf=NULL,
   mse=getOption("survey.replicates.mse"),...)
# S3 method for character
svrepdesign(variables=NULL,repweights=NULL,
weights=NULL,data=NULL, degf=NULL,
type=c("BRR","Fay","JK1", "JKn","bootstrap","ACS","successive-difference","JK2","other"),
combined.weights=TRUE, rho=NULL, bootstrap.average=NULL, scale=NULL,rscales=NULL,
fpc=NULL,fpctype=c("fraction","correction"),mse=getOption("survey.replicates.mse"),
 dbtype="SQLite", dbname,...) 
degf(design)<-value
# S3 method for svyrep.design
image(x, ...,
				 col=grey(seq(.5,1,length=30)), type.=c("rep","total"))

Value

Object of class svyrep.design, with methods for print,

summary, weights, image.

Arguments

variables: formula or data frame specifying variables to include in the design (default is all)
repweights: formula or data frame specifying replication weights, or character string specifying a regular expression that matches the names of the replication weight variables
weights: sampling weights
data: data frame to look up variables in formulas, or character string giving name of database table
degf: Design degrees of freedom; use NULL to have the function work this out for you
type: Type of replication weights
combined.weights: TRUE if the repweights already include the sampling weights. This is usually the case.
rho: Shrinkage factor for weights in Fay's method
bootstrap.average: For type="bootstrap", if the bootstrap weights have been averaged, gives the number of iterations averaged over
scale, rscales: Scaling constant for variance, see Details below
fpc,fpctype: Finite population correction information
mse: If TRUE, compute variances based on sum of squares around the point estimate, rather than the mean of the replicates
dbname: name of database, passed to DBI::dbConnect()
dbtype: Database driver: see Details
x: survey design with replicate weights
...: Other arguments to image
col: Colors
type.: "rep" for only the replicate weights, "total" for the replicate and sampling weights combined.
design: replicate-weight design
value: new degrees of freedom to assign

Details

In the BRR method, the dataset is split into halves, and the difference between halves is used to estimate the variance. In Fay's method, rather than removing observations from half the sample they are given weight rho in one half-sample and 2-rho in the other. The ideal BRR analysis is restricted to a design where each stratum has two PSUs, however, it has been used in a much wider class of surveys. The scale and rscales arguments will be ignored (with a warning) if they are specified.

The JK1 and JKn types are both jackknife estimators deleting one cluster at a time. JKn is designed for stratified and JK1 for unstratified designs.

The successive-difference weights in the American Community Survey automatically use scale = 4/ncol(repweights) and rscales=rep(1, ncol(repweights)). This can be specified as type="ACS" or type="successive-difference". The scale and rscales arguments will be ignored (with a warning) if they are specified. The American Community Survey recommends mse-style standard error estimates; if you do not specify mse explicitly mse=TRUE will be set with a message, overriding getOption("survey.replicates.mse"). If you explicitly specify mse=FALSE there will be a warning but your choice will be respected.

JK2 weights (type="JK2"), as in the California Health Interview Survey, automatically use scale=1, rscales=rep(1, ncol(repweights)). The scale and rscales arguments will be ignored (with a warning) if they are specified.

Averaged bootstrap weights ("mean bootstrap") are used for some surveys from Statistics Canada. Yee et al (1999) describe their construction and use for one such survey.

The variance is computed as the sum of squared deviations of the replicates from their mean. This may be rescaled: scale is an overall multiplier and rscales is a vector of replicate-specific multipliers for the squared deviations. That is, rscales should have one entry for each column of repweights If thereplication weights incorporate the sampling weights (combined.weights=TRUE) or for type="other" these must be specified, otherwise they can be guessed from the weights.

A finite population correction may be specified for type="other", type="JK1" and type="JKn". fpc must be a vector with one entry for each replicate. To specify sampling fractions use fpctype="fraction" and to specify the correction directly use fpctype="correction"

The design degrees of freedom are returned by degf. By default they are computed from the numerical rank of the repweights. This is slow for very large data sets and you can specify a value instead. The specified value is not modified when you subset the object; to change it use the degf<- assignment method

repweights may be a character string giving a regular expression for the replicate weight variables. For example, in the California Health Interview Survey public-use data, the sampling weights are "rakedw0" and the replicate weights are "rakedw1" to "rakedw80". The regular expression "rakedw[1-9]" matches the replicate weight variables (and not the sampling weight variable).

data may be a character string giving the name of a table or view in a relational database that can be accessed through the DBI interface. For DBI interfaces dbtype should be the name of the database driver and dbname should be the name by which the driver identifies the specific database (eg file name for SQLite).

The appropriate database interface package must already be loaded (eg RSQLite for SQLite). The survey design object will contain the replicate weights, but actual variables will be loaded from the database only as needed. Use close to close the database connection and open to reopen the connection, eg, after loading a saved object.

The database interface does not attempt to modify the underlying database and so can be used with read-only permissions on the database.

To generate your own replicate weights either use as.svrepdesign on a survey.design object, or see brrweights, bootweights, jk1weights and jknweights

The model.frame method extracts the observed data.

References

Levy and Lemeshow. "Sampling of Populations". Wiley.

Shao and Tu. "The Jackknife and Bootstrap." Springer.

Yee et al (1999). Bootstrat Variance Estimation for the National Population Health Survey. Proceedings of the ASA Survey Research Methodology Section. https://web.archive.org/web/20151110170959/http://www.amstat.org/sections/SRMS/Proceedings/papers/1999_136.pdf

Examples

Run this code

data(scd)
# use BRR replicate weights from Levy and Lemeshow
repweights<-2*cbind(c(1,0,1,0,1,0), c(1,0,0,1,0,1), c(0,1,1,0,0,1),
c(0,1,0,1,1,0))
scdrep<-svrepdesign(data=scd, type="BRR", repweights=repweights, combined.weights=FALSE)
svyratio(~alive, ~arrests, scdrep)


if (FALSE) {
## Needs RSQLite
library(RSQLite)
db_rclus1<-svrepdesign(weights=~pw, repweights="wt[1-9]+", type="JK1", scale=(1-15/757)*14/15,
data="apiclus1rep",dbtype="SQLite", dbname=system.file("api.db",package="survey"), combined=FALSE)
svymean(~api00+api99,db_rclus1)

summary(db_rclus1)

## closing and re-opening a connection
close(db_rclus1)
db_rclus1
try(svymean(~api00+api99,db_rclus1))
db_rclus1<-open(db_rclus1)
svymean(~api00+api99,db_rclus1)



}

Run the code above in your browser using DataLab