Last chance! 50% off unlimited learning
Sale ends in
Fit a generalised linear model to data from a complex survey design, with inverse-probability weighting and design-based standard errors.
# S3 method for survey.design
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=TRUE, ..., deff=FALSE,
influence=FALSE)
# S3 method for svyrep.design
svyglm(formula, design, subset=NULL,
family=stats::gaussian(),start=NULL, rescale=NULL, ..., rho=NULL,
return.replicates=FALSE, na.action,multicore=getOption("survey.multicore"))
# S3 method for svyglm
summary(object, correlation = FALSE, df.resid=NULL,
...)
# S3 method for svyglm
predict(object,newdata=NULL,total=NULL,
type=c("link","response","terms"),
se.fit=(type != "terms"),vcov=FALSE,...)
# S3 method for svrepglm
predict(object,newdata=NULL,total=NULL,
type=c("link","response","terms"),
se.fit=(type != "terms"),vcov=FALSE,
return.replicates=!is.null(object$replicates),...)
svyglm
returns an object of class svyglm
. The
predict
method returns an object of class svystat
Model formula
Survey design from svydesign
or svrepdesign
. Must contain all variables
in the formula
Expression to select a subpopulation
family
object for glm
Starting values for the coefficients (needed for some uncommon link/family combinations)
Rescaling of weights, to improve numerical stability. The default
rescales weights to sum to the sample size. Use FALSE
to not
rescale weights. For replicate-weight designs, use TRUE
to
rescale weights to sum to 1, as was the case before version 3.34.
Other arguments passed to glm
or
summary.glm
For replicate BRR designs, to specify the parameter for
Fay's variance method, giving weights of rho
and 2-rho
Return the replicates as the replicates
component of the
result? (for predict
, only possible if they
were computed in the svyglm
fit)
Estimate the design effects
Return influence functions
A svyglm
object
Include the correlation matrix of parameters?
Handling of NAs
Use the multicore
package to distribute
replicates across processors?
Optional denominator degrees of freedom for Wald tests
new data frame for prediction
population size when predicting population total
linear predictor (link
) or response
if TRUE
, return variances of predictions
if TRUE
and se=TRUE
return full
variance-covariance matrix of predictions
Thomas Lumley
For binomial and Poisson families use family=quasibinomial()
and family=quasipoisson()
to avoid a warning about non-integer
numbers of successes. The `quasi' versions of the family objects give
the same point estimates and standard errors and do not give the
warning.
If df.resid
is not specified the df for the null model is
computed by degf
and the residual df computed by
subtraction. This is recommended by Korn and Graubard and is correct
for PSU-level covariates but is potentially very conservative for
individual-level covariates. To get tests based on a Normal distribution
use df.resid=Inf
, and to use number of PSUs-number of strata,
specify df.resid=degf(design)
.
Parallel processing with multicore=TRUE
is helpful only for
fairly large data sets and on computers with sufficient memory. It may
be incompatible with GUIs, although the Mac Aqua GUI appears to be safe.
predict
gives fitted values and sampling variability for specific new
values of covariates. When newdata
are the population mean it
gives the regression estimator of the mean, and when newdata
are
the population totals and total
is specified it gives the
regression estimator of the population total. Regression estimators of
mean and total can also be obtained with calibrate
.
Lumley T, Scott A (2017) "Fitting Regression Models to Survey Data" Statistical Science 32: 265-278
glm
, which is used to do most of the work.
regTermTest
, for multiparameter tests
calibrate
, for an alternative way to specify regression
estimators of population totals or means
svyttest
for one-sample and two-sample t-tests.
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
dclus2<-svydesign(id=~dnum+snum, weights=~pw, data=apiclus2)
rstrat<-as.svrepdesign(dstrat)
rclus2<-as.svrepdesign(dclus2)
summary(svyglm(api00~ell+meals+mobility, design=dstrat))
summary(svyglm(api00~ell+meals+mobility, design=dclus2))
summary(svyglm(api00~ell+meals+mobility, design=rstrat))
summary(svyglm(api00~ell+meals+mobility, design=rclus2))
## use quasibinomial, quasipoisson to avoid warning messages
summary(svyglm(sch.wide~ell+meals+mobility, design=dstrat,
family=quasibinomial()))
## Compare regression and ratio estimation of totals
api.ratio <- svyratio(~api.stu,~enroll, design=dstrat)
pop<-data.frame(enroll=sum(apipop$enroll, na.rm=TRUE))
npop <- nrow(apipop)
predict(api.ratio, pop$enroll)
## regression estimator is less efficient
api.reg <- svyglm(api.stu~enroll, design=dstrat)
predict(api.reg, newdata=pop, total=npop)
## same as calibration estimator
svytotal(~api.stu, calibrate(dstrat, ~enroll, pop=c(npop, pop$enroll)))
## svyglm can also reproduce the ratio estimator
api.reg2 <- svyglm(api.stu~enroll-1, design=dstrat,
family=quasi(link="identity",var="mu"))
predict(api.reg2, newdata=pop, total=npop)
## higher efficiency by modelling variance better
api.reg3 <- svyglm(api.stu~enroll-1, design=dstrat,
family=quasi(link="identity",var="mu^3"))
predict(api.reg3, newdata=pop, total=npop)
## true value
sum(apipop$api.stu)
Run the code above in your browser using DataLab