spaMM_boot: Parametric bootstrap

Description

spaMM_boot simulates samples from a fit object inheriting from class "HLfit", as produced by spaMM's fitting functions, and applies a given function to each simulated sample. Parallelization is supported (see Details).

spaMM2boot is similar except that it assumes that the original model is refitted on the simulated data, and the given function is applied to the refitted model, and the value is in a format directly usable as input for boot::boot.ci.

Both of these functions can be used to apply standard parametric bootstrap procedures. spaMM_boot is suitable for more diverse applications, e.g. to fit by one model some samples simulated under another model (see Example).

Usage

spaMM_boot(object, simuland, nsim, nb_cores=NULL, seed=NULL,
           resp_testfn=NULL, control.foreach=list(),
           debug. = FALSE, type, fit_env=NULL, cluster_args=NULL,
           showpbar= eval(spaMM.getOption("barstyle")),
           boot_samples=NULL,
           ...)
spaMM2boot(object, statFUN, nsim, nb_cores=NULL, seed=NULL,
           resp_testfn=NULL, control.foreach=list(),
           debug. = FALSE, type="marginal", fit_env=NULL, 
           cluster_args=NULL, showpbar= eval(spaMM.getOption("barstyle")),
           boot_samples=NULL,
           ...)

Value

spaMM_boot returns a list, with the following element(s) (unless debug. is TRUE):

bootreps: nsim return values in the format returned either by apply or parallel::parApply or by foreach::`%dopar%` as controlled by control.foreach$.combine (which is here "rbind" by default).
RNGstate: (absent in the case the boot_samples argument was used to provide the new response values but not the RNGstate) the state of .Random.seed at the beginning of the sample simulation

spaMM2boot returns a list suitable for use by boot.ci, with elements:

t: nsim return values of the simulated statistic (in matrix format).
t0: nsim return the value of statFUN from the original fit.
sim: The simulation type ("parametric").
R: nsim
.Random.seed: the state of .Random.seed at the beginning of the sample simulation

(other elements of an object of class boot are currently not included.)

Arguments

object: The fit object to simulate from.
simuland: The function to apply to each simulated sample. See Details for requirements of this function.
statFUN: The function to apply to each fit object for each simulated sample. See Details for requirements of this function.
nsim: Number of samples to simulate and analyze.
nb_cores: Number of cores to use for parallel computation. The default is spaMM.getOption("nb_cores"), and 1 if the latter is NULL. nb_cores=1 prevents the use of parallelisation procedures.
seed: Passed to simulate.HLfit
resp_testfn: Passed to simulate.HLfit; NULL, or a function that tests a condition which simulated samples should satisfy. This function takes a response vector as argument and return a boolean (TRUE indicating that the sample satisfies the condition).
control.foreach: list of control arguments for foreach. These include in particular .combine (with default value "rbind"), and .errorhandling (with default value "remove", but "pass" is quite useful for debugging).
debug.: Boolean (or integer, interpreted as boolean). For debugging purposes, given that spaMM_boot does not stop when the fit of a bootstrap replicate fails. Subject to changes with no or little notice. In serial computation, debug.=2 will stop on an error. In parallel computation, this would be ignored. The effect of debug.=TRUE depends on what simuland does of it. The default simuland for likelihood ratio testing functions, eval_replicate, shows how debug. can be used to control a call to dump.frames (however, debugging user-defined functions by such a call does not require control by debug.).
type: Character: passed to simulate.HLfit. Defaults, with a warning, to type="marginal" in order to replicate the behaviour of previous versions of spaMM_boot. This is an appropriate default for various parametric bootstrpa analyses, but not necessarily the appropriate type for all possible uses. See Details of simulate.HLfit for other implemented options.
fit_env: An environment or list containing variables necessary to evaluate simuland on each sample, and not included in the fit object. E.g., use fit_env=list(phi_fix=phi_fix) if the fit assumed fixed=list(phi=phi_fix): the name in list(phi_fix=<.>) must be the name of the object that will be sought by the called process when interpreting fixed=list(phi=phi_fix) (if still unsure about the proper syntax, see the clusterExport documentation, as fit_env is used in the following context: parallel::clusterExport(cl=<cluster>, varlist=ls(fit_env), envir=fit_env)).
cluster_args: NULL or a list of arguments, passed to makeCluster.
showpbar: Controls display of progress bar. See barstyle option for details.
boot_samples: NULL, or precomputed bootstrap samples from the fitted model, provided as a matrix with one column per bootstrap replicate (the format of the result of simulate.HLfit), or as a list including a bootreps element with the same matrix format.
...: Further arguments passed to the simuland function.

Details

The simuland function must take as first argument a vector of response values, and may have other arguments including ‘...’. When required, these additional arguments must be passed through the ‘...’ arguments of spaMM_boot. Variables needed to evaluate them must be available from within the simuland function or otherwise provided as elements of fit_env.

The statFUN function must take as first argument (named refit) a fit object, and may have other arguments including ‘...’ handled as for simuland.

spaMM_boot handles parallel backends with different features. pbapply::pbapply has a very simple interface (essentially equivalent to apply) and provides progress bars, but (in version 1.4.0, at least) does not have efficient load-balancing. doSNOW also provides a progress bar and allows more efficient load-balancing, but its requires foreach. foreach handles errors differently from pbapply (which will simply stop if fitting a model to a bootstrap replicate fails): see the foreach documentation.

spaMM_boot calls simulate.HLfit on the fit object and applies simuland on each column of the matrix returned by this call. simulate.HLfit uses the type argument, which must be explicitly provided.

Examples

Run this code

if (spaMM.getOption("example_maxtime")>7) {
 data("blackcap")
 
 # Generate fits of null and full models:
 lrt <- fixedLRT(null.formula=migStatus ~ 1 + Matern(1|longitude+latitude),
                 formula=migStatus ~ means + Matern(1|longitude+latitude), 
                 method='ML',data=blackcap)

 # The 'simuland' argument: 
 myfun <- function(y, what=NULL, lrt, ...) { 
    data <- lrt$fullfit$data
    data$migStatus <- y ## replaces original response (! more complicated for binomial fits)
    full_call <- getCall(lrt$fullfit) ## call for full fit
    full_call$data <- data
    res <- eval(full_call) ## fits the full model on the simulated response
    if (!is.null(what)) res <- eval(what)(res=res) ## post-process the fit
    return(res) ## the fit, or anything produced by evaluating 'what'
  }
  # where the 'what' argument (not required) of myfun() allows one to control 
  # what the function returns without redefining the function.
  
  # Call myfun() with no 'what' argument: returns a list of fits 
  spaMM_boot(lrt$nullfit, simuland = myfun, nsim=1, lrt=lrt, 
             type ="marginal")[["bootreps"]] 
  
  # Return only a model coefficient for each fit: 
  spaMM_boot(lrt$nullfit, simuland = myfun, nsim=7,
             what=quote(function(res) fixef(res)[2L]), 
             lrt=lrt, type ="marginal")[["bootreps"]]       
  
  if (FALSE) {
    # Parametric bootstrap by spaMM2boot() and spaMM_boot():
    boot.ci_info <- spaMM2boot(lrt$nullfit, statFUN = function(refit) fixef(refit)[1], 
                               nsim=99, type ="marginal")
    boot::boot.ci(boot.ci_info, , type=c("basic","perc","norm"))
    
    nullfit <- lrt$nullfit
    boot_t <- spaMM_boot(lrt$nullfit, simuland = function(y, nullfit) {
      refit <- update_resp(nullfit, y)
      fixef(refit)[1]
    }, nsim=99, type ="marginal", nullfit=nullfit)$bootreps
    boot::boot.ci(list(R = length(boot_t), sim="parametric"), t0=fixef(nullfit)[1], 
                  t= t(boot_t), type=c("basic","perc","norm"))


  }           
}

Run the code above in your browser using DataLab