svy2lme: Linear mixed models by pairwise likelihood

Description

Fits linear mixed models to survey data by maximimising the profile pairwise composite likelihood.

Usage

svy2lme(formula, design, sterr=TRUE,  return.devfun=FALSE,
method=c("general","nested"), all.pairs=FALSE, subtract.margins=FALSE)
# S3 method for svy2lme
coef(object,...,random=FALSE)

Value

svy2lme returns an object with print, coef, and

vcov methods.

The coef method with random=TRUE returns a two-element list: the first element is the estimated residual variance, the second is the matrix of estimated variances and covariances of the random effects.

Arguments

formula: Model formula as in the lme4 package
design: A survey design object produced by survey::svydesign. The pairwise weights will be computed from this design, which must have separate probabilities or weights for each stage of sampling.
sterr: Estimate standard errors for fixed effects? Set to FALSE for greater speed when using resampling to get standard errors. Also, a PPS-without-replacement survey design can't get sandwich standard errors (because fourth-order sampling probabilities would be needed)
return.devfun: If TRUE, return the deviance function as a component of the object. This will increase the memory use substantially, but allows for bootstrapping.
method: "nested" requires the model clusters to have a single grouping variable that is the same as the primary sampling unit. It's faster.
all.pairs: Only with method="general", use all pairs rather than just correlated pairs?
subtract.margins: If TRUE and all.pairs=TRUE, compute with all pairs by the faster algorithm involving subtraction from the marginal likelihood
object: svy2lme object
...: for method compatibility
random: if TRUE, the variance components rather than the fixed effects

Author

Thomas Lumley

Details

The population pairwise likelihood would be the sum of the loglikelihoods for a pair of observations, taken over all pairs of observations from the same cluster. This is estimated by taking a weighted sum over pairs in the sample, with the weights being the reciprocals of pairwise sampling probabilities. The advantage over standard weighted pseudolikelihoods is that there is no large-cluster assumption needed and no rescaling of weights. The disadvantage is some loss of efficiency and simpler point estimation.

With method="nested" we have the method of Yi et al (2016). Using method="general" relaxes the conditions on the design and model.

The code uses lme4::lmer to parse the formula and produce starting values, profiles out the fixed effects and residual variance, and then uses minqa::bobyqa to maximise the resulting profile deviance.

As with lme4::lmer, the default is to estimate the correlations of the random effects, since there is typically no reason to assume these are zero. You can force two random effects to be independent by entering them in separate terms, eg (1|g)+(-1+x|g) in the model formula asks for a random intercept and a random slope with no random intercept, which will be uncorrelated.

The internal parametrisation of the variance components uses the Cholesky decomposition of the relative variance matrix (the variance matrix divided by the residual variance), as in lme4::lmer. The component object$s2 contains the estimated residual variance and the component object$opt$par contains the elements of the Cholesky factor in column-major order, omitting any elements that are forced to be zero by requiring indepedent random effects.

Standard errors of the fixed effects are currently estimated using a "with replacement" approximation, valid when the sampling fraction is small and superpopulation (model, process) inference is intended. Tthe influence functions are added up within cluster, centered within strata, the residuals added up within strata, and then the crossproduct is taken within each stratum. The stratum variance is scaled by ni/(ni-1) where ni is the number of PSUs in the stratum, and then added up across strata. When the sampling and model structure are the same, this is the estimator of Yi et al, but it also allows for there to be sampling stages before the stages that are modelled, and for the model and sampling structures to be different.

The return.devfun=TRUE option is useful if you want to examine objects that aren't returned as part of the output. For example, get("ij", environment(object$devfun)) is the set of pairs used in computation.

References

J.N.K. Rao, François Verret and Mike A. Hidiroglou "A weighted composite likelihood approach to inference for two-level models from survey data" Survey Methodology, December 2013 Vol. 39, No. 2, pp. 263-282

Grace Y. Yi , J. N. K. Rao and Haocheng Li "A WEIGHTED COMPOSITE LIKELIHOOD APPROACH FOR ANALYSIS OF SURVEY DATA UNDER TWO-LEVEL MODELS" Statistica Sinica Vol. 26, No. 2 (April 2016), pp. 569-587

Examples

Run this code


data(api, package="survey")

# one-stage cluster sample
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
# two-stage cluster sample
dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)

svy2lme(api00~(1|dnum)+ell+mobility+api99, design=dclus1)
svy2lme(api00~(1|dnum)+ell+mobility+api99, design=dclus2)
svy2lme(api00~(1|dnum)+ell+mobility+api99, design=dclus2,method="nested")

lme4::lmer(api00~(1|dnum)+ell+mobility+api99, data=apipop)

## Simulated

set.seed(2000-2-29)

df<-data.frame(x=rnorm(1000*20),g=rep(1:1000,each=20), t=rep(1:20,1000), id=1:20000)
df$u<-with(df, rnorm(1000)[g])

df$y<-with(df, x+u+rnorm(1000,s=2))

## oversample extreme `u` to bias random-intercept variance
pg<-exp(abs(df$u/2)-2.2)[df$t==1]

in1<-rbinom(1000,1,pg)==1
in2<-rep(1:5, length(in1))

sdf<-subset(df, (g %in% (1:1000)[in1]) & (t %in% in2))

p1<-rep(pg[in1],each=5)
N2<-rep(20,nrow(sdf))

## Population values
lme4::lmer(y~x+(1|g), data=df)

## Naive estimator: higher intercept variance
lme4::lmer(y~x+(1|g), data=sdf)

##pairwise estimator
sdf$w1<-1/p1
sdf$w2<-20/5

design<-survey::svydesign(id=~g+id, data=sdf, weights=~w1+w2)
pair<-svy2lme(y~x+(1|g),design=design,method="nested")
pair

pair_g<-svy2lme(y~x+(1|g),design=design,method="general")
pair_g

all.equal(coef(pair), coef(pair_g))
all.equal(SE(pair), SE(pair_g))

Run the code above in your browser using DataLab