get_cPredVar: Estimation of prediction variance with bootstrap correction

Description

This function is similar to get_predVar except that is uses a bootstrap procedure to correct for bias in the evaluation of the prediction variance.

Usage

get_cPredVar(pred_object, newdata, nsim, seed, type = "residual", 
             variances=NULL, nb_cores = NULL, fit_env = NULL,
             sim_object=pred_object)

Value

A vector of prediction variances, with an attribute info which is an environment containing variables:

SEs: the standard errors of the estimates (which are those of the bootstrap replicates)
bias: the bias term
maive: B&H's “naive” $\nu_i$

Arguments

pred_object: an object of class HLfit, as returned by the fitting functions in spaMM.
newdata: passed to predict.HLfit (it thus represents a prediction design, not to be confused with the bootstrap samples)
nsim: passed to simulate.HLfit
seed: passed to simulate.HLfit
type: passed to simulate.HLfit
variances: NULL or list; variances["cov"] will be passed to predict.HLfit to control whether a covariance matrix is computed or not. Other elements are currently ignored.
nb_cores: integer: number of cores to use for parallel computation of bootstrap. The default is spaMM.getOption("nb_cores"), and 1 if the latter is NULL. nb_cores=1 prevents the use of parallelisation procedures.
fit_env: For parallel computations: an environment containing objects to be passed to the cores. They should have the same name in fit_env as in the environment they are passed from.
sim_object: an object of class HLfit, passed to simulate.HLfit as its object argument. Simulating from this object must produce response values that can be used as replacement to those of the original fitted pred_object. In standard usage, sim_object=pred_object (the default).

Details

The result provided by get_cPredVar is similar to the CMSEP (Conditional Mean Standard Error of Prediction) introduced by Booth and Hobert (1998; “B&H”). This paper is known for pointing the importance of using conditional variances when they differ from unconditional ones. This is hard to miss in spatial models, where the relevant prediction variance typically depends on the variance of random effects conditional on the data. Thus, the alternative function get_predVar already accounts for this and returns a prediction variance that depends on a joint covariance of fixed-effect estimates and of random effects given the data.

B&H also used a conditional bootstrap procedure to correct for some bias. get_cPredVar implements a similar procedure, in contrast to get_predVar. Their conditional bootstrap procedure is not applicable for autocorrelated random effects, and parametric bootstrapping of the residuals of the fitted model (as implied by the default value of argument type) is used instead here. Apart from this difference, the returned value includes exactly the same terms as those discussed by B&H: their “naive estimate” $\nu_i$ and its bootstrap correction $b_i$, their correction $\beta$ for uncertainty in fixed-effect coefficients, and their correction $\sigma^2$ for uncertainty in dispersion parameters.

This use of the bootstrap does not account for uncertainty in correlation parameters “outer-optimized” by fitme or corrHLfit, because the correlation parameters are fixed when the model is refitted on the bootstrap replicates. Even if it the correlation parameters were refitted, the full computation would not be sufficient to account for uncertainty in them. To account for uncertainty in correlation parameters, one should rather perform a parametric bootstrap of the full model (typically using spaMM_boot(., type="residual")), which may take much more time.

The “naive estimate” $\nu_i$ is not generally an estimate of anything uniquely defined by the model parameters: for correlated random effects, it depends on the “root” of the correlation matrix of the random effects, which is not unique. Thus $\nu_i$ is not unique, and may differ for example for equivalent fits by sparse-precision methods vs. other methods. Nevertheless, attr(cpredvar,"info")$naive does recover published values in the Examples below, as they involve no correlation matrix.

References

Booth, J.G., Hobert, J.P. (1998) Standard errors of prediction in generalized linear mixed models. J. Am. Stat. Assoc. 93: 262-272.

Examples

Run this code

if (FALSE) {
if(requireNamespace("rsae", quietly = TRUE)) {
  # LMM example from Booth & Hobert 1998 JASA
  data("landsat", package = "rsae")
  fitCorn <- fitme(HACorn ~ PixelsCorn + PixelsSoybeans + (1|CountyName),data=landsat[-33,])
  newXandZ <- unique(data.frame(PixelsCorn=landsat$MeanPixelsCorn,
                                PixelsSoybeans=landsat$MeanPixelsSoybeans,
                                CountyName=landsat$CountyName))
  (cpredvar <- get_cPredVar(fitCorn, newdata=newXandZ, nsim=200L, seed=123)) # serial computation
  (cpredvar <- get_cPredVar(fitCorn, newdata=newXandZ, nsim=200L, seed=123, 
        nb_cores=parallel::detectCores(logical=FALSE)-1L, 
        fit_env=list2env(list(newXandZ=newXandZ))))
}

# GLMM example from Booth & Hobert 1998 JASA
data(clinics)
fitClinics <- HLfit(cbind(npos,nneg)~treatment+(1|clinic),family=binomial(),data=clinics)
#
(get_cPredVar(fitClinics, newdata=clinics[1:8,], nsim=200L, seed=123))  # serial computation
(get_cPredVar(fitClinics, newdata=clinics[1:8,], nsim=200L, seed=123, 
      nb_cores=parallel::detectCores(logical=FALSE)-1L, 
      fit_env=list2env(list(clinics=clinics))))
}

Run the code above in your browser using DataLab