bootMer: Model-based (Semi-)Parametric Bootstrap for Mixed Models

Description

Perform model-based (Semi-)parametric bootstrap for mixed models.

Usage

bootMer(x, FUN, nsim = 1, seed = NULL, use.u = FALSE, re.form=NA,
	type = c("parametric", "semiparametric"),
	verbose = FALSE, .progress = "none", PBargs = list(),
	parallel = c("no", "multicore", "snow"),
	ncpus = getOption("boot.ncpus", 1L), cl = NULL)

Arguments

a fitted merMod object: see lmer, glmer, etc.

FUN

a function taking a fitted merMod object as input and returning the statistic of interest, which must be a (possibly named) numeric vector.

nsim

number of simulations, positive integer; the bootstrap \(B\) (or \(R\)).

seed

optional argument to set.seed.

use.u

logical, indicating whether the spherical random effects should be simulated / bootstrapped as well. If TRUE, they are not changed, and all inference is conditional on these values. If FALSE, new normal deviates are drawn (see Details).

re.form

formula, NA (equivalent to use.u=FALSE), or NULL (equivalent to use.u=TRUE): alternative to use.u for specifying which random effects to incorporate. See simulate.merMod for details.

type

character string specifying the type of bootstrap, "parametric" or "semiparametric"; partial matching is allowed.

verbose

logical indicating if progress should print output

.progress

character string - type of progress bar to display. Default is "none"; the function will look for a relevant *ProgressBar function, so "txt" will work in general; "tk" is available if the tcltk package is loaded; or "win" on Windows systems. Progress bars are disabled (with a message) for parallel operation.

PBargs

a list of additional arguments to the progress bar function (the package authors like list(style=3)).

parallel

The type of parallel operation to be used (if any). If missing, the default is taken from the option "boot.parallel" (and if that is not set, "no").

ncpus

integer: number of processes to be used in parallel operation: typically one would choose this to be the number of available CPUs.

An optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the boot call.

Value

an object of S3 class "boot", compatible with boot package's boot() result.

Details

The semi-parametric variant is only partially implemented, and we only provide a method for lmer and glmer results. The working name for bootMer() was “simulestimate()”, as it is an extension of simulate (see simulate.merMod), but we want to emphasize its potential for valid inference.

If use.u is FALSE and type is "parametric", each simulation generates new values of both the “spherical” random effects \(u\) and the i.i.d. errors \(\epsilon\), using rnorm() with parameters corresponding to the fitted model x.
If use.u is TRUE and type=="parametric", only the i.i.d. errors (or, for GLMMs, response values drawn from the appropriate distributions) are resampled, with the values of \(u\) staying fixed at their estimated values.
If use.u is TRUE and type=="semiparametric", the i.i.d. errors are sampled from the distribution of (response) residuals. (For GLMMs, the resulting sample will no longer have the same properties as the original sample, and the method may not make sense; a warning is generated.) The semiparametric bootstrap is currently an experimental feature, and therefore may not be stable.
The case where use.u is FALSE and type=="semiparametric" is not implemented; Morris (2002) suggests that resampling from the estimated values of \(u\) is not good practice.

References

Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and Their Application. Cambridge University Press. Morris, J. S. (2002). The BLUPs Are Not ‘best’ When It Comes to Bootstrapping. Statistics & Probability Letters 56(4): 425--430. doi:10.1016/S0167-7152(02)00041-X.

Examples

Run this code

fm01ML <- lmer(Yield ~ 1|Batch, Dyestuff, REML = FALSE)
## see ?"profile-methods"
mySumm <- function(.) { s <- sigma(.)
    c(beta =getME(., "beta"), sigma = s, sig01 = unname(s * getME(., "theta"))) }
(t0 <- mySumm(fm01ML)) # just three parameters
## alternatively:
mySumm2 <- function(.) {
   c(beta=fixef(.),sigma=sigma(.), sig01=sqrt(unlist(VarCorr(.))))
}

set.seed(101)
## 3.8s (on a 5600 MIPS 64bit fast(year 2009) desktop "AMD Phenom(tm) II X4 925"):
system.time( boo01 <- bootMer(fm01ML, mySumm, nsim = 100) )

## to "look" at it
require("boot") ## a recommended package, i.e. *must* be there
boo01
## note large estimated bias for sig01
## (~30% low, decreases _slightly_ for nsim = 1000)

## extract the bootstrapped values as a data frame ...
head(as.data.frame(boo01))

## ------ Bootstrap-based confidence intervals ------------

## warnings about "Some ... intervals may be unstable" go away
##   for larger bootstrap samples, e.g. nsim=500

## intercept
(bCI.1 <- boot.ci(boo01, index=1, type=c("norm", "basic", "perc")))# beta

## Residual standard deviation - original scale:
(bCI.2  <- boot.ci(boo01, index=2, type=c("norm", "basic", "perc")))
## Residual SD - transform to log scale:
(bCI.2L <- boot.ci(boo01, index=2, type=c("norm", "basic", "perc"),
                   h = log, hdot = function(.) 1/., hinv = exp))

## Among-batch variance:
(bCI.3 <- boot.ci(boo01, index=3, type=c("norm", "basic", "perc"))) # sig01


## Copy of unexported stats:::format.perc helper function
format.perc <- function(probs, digits) {
    paste(format(100 * probs, trim = TRUE,
                 scientific = FALSE, digits = digits),
          "%")
}

## Extract all CIs (somewhat awkward)
bCI.tab <- function(b,ind=length(b$t0), type="perc", conf=0.95) {
    btab0 <- t(sapply(as.list(seq(ind)),
                    function(i)
                    boot.ci(b,index=i,conf=conf, type=type)$percent))
    btab <- btab0[,4:5]
    rownames(btab) <- names(b$t0)
    a <- (1 - conf)/2
    a <- c(a, 1 - a)
    pct <- format.perc(a, 3)
    colnames(btab) <- pct
    return(btab)
}
bCI.tab(boo01)

## Graphical examination:
plot(boo01,index=3)

## Check stored values from a longer (1000-replicate) run:
(load(system.file("testdata","boo01L.RData", package="lme4")))# "boo01L"
plot(boo01L, index=3)
mean(boo01L$t[,"sig01"]==0) ## note point mass at zero!

Run the code above in your browser using DataLab