Bootstrap model effects (standardised coefficients) and optional SEM correlated errors.
bootEff(mod, data = NULL, ran.eff = NULL, cor.err = NULL,
R = 10000, seed = NULL, catch.err = TRUE, parallel = "snow",
ncpus = NULL, cl = NULL, bM.arg = NULL, ...)
A fitted model object of class "lm"
, "glm"
, or
"merMod"
, or a list or nested list of such objects.
An optional dataset used to first re-fit the model(s).
For mixed models with nested random effects, the name of the
variable comprising the highest-level random effect. For non-nested random
effects, specify "crossed"
. Non-specification of this argument when
mod
is a mixed model(s) will result in an error.
Optional, names of SEM correlated errors to be bootstrapped.
Should be of the form: c("mod1 ~~ mod2", "mod3 ~~ mod4", ...)
(spaces optional), with names matching model names.
Number of bootstrap replicates to generate.
Seed for the random number generator. If not provided, a random five-digit integer is used (see Details).
Logical, should errors generated during model fitting or
estimation be caught and NA
returned for estimates? If FALSE
,
any such errors will cause the function to exit.
The type of parallel processing to use. Can be one of
"snow"
, "multicore"
, or "no"
(for none).
Number of system cores to use for parallel processing. If
NULL
(default), all available cores are used.
Optional cluster to use if parallel = "snow"
. If NULL
(default), a local cluster is created using the specified number of cores.
A named list of any additional arguments to bootMer
.
Arguments to stdCoeff
.
An object of class "boot"
containing the bootstrapped effects,
or a list/nested list of such objects.
bootEff
uses the boot
function (primarily) to
bootstrap effects from a fitted model or list of models (i.e. standardised
coefficients, calculated using stdCoeff
). Bootstrapping is typically
nonparametric, i.e. coefficients are calculated from data where the rows
have been randomly sampled with replacement. The number of replicates is
set by default to 10,000, which should provide accurate coverage for
confidence intervals in most situations. To ensure that data is resampled
in the same way across individual bootstrap operations within the same run
(e.g. models in a list), the same seed is set per operation, with the value
saved as an attribute to the bootstrapped values (for reproducibility). The
seed can either be user-supplied or a randomly-generated five-digit number
(default), and is always re-initialised on exit (i.e.
set.seed(NULL)
).
Where weights
are specified, bootstrapped effects will be a weighted
average across the set of candidate models for each response variable,
calculated after each model is first refit to the resampled dataset
(specifying weights = "equal"
will use a simple average instead). If
no weights are specified and mod
is a nested list of models, the
function will throw an error, as it will be expecting weights for a
presumed model averaging scenario. If instead the user wishes to bootstrap
each individual model, they should recursively apply the function using
rMapply
(remember to set a seed).
Where names of models with correlated errors are specified to
cor.err
, the function will also return bootstrapped Pearson
correlated errors (weighted.residuals
) for those models. If
weights
are supplied and mod
is a nested list, residuals will
first be averaged across candidate models. If any two models (or candidate
sets) with correlated errors were fit to different subsets of data
observations, both models/sets are first refit to data containing only the
observations in common.
For mixed models with nested random effects, the highest-level random
effect (only) in the dataset is resampled, a procedure which should best
retain the hierarchical structure of the data (Davison & Hinkley 1997, Ren
et al. 2010). Lower-level groups or individual observations are not
themselves resampled, as these are not independent. The name of this random
effect must be supplied to ran.eff
, matching the name in the data.
Incidentally, this form of resampling will result in different sized
datasets if observations are unbalanced across groups; however this should
not generally be an issue, as the number of independent units (groups), and
hence the 'degrees of freedom', remains
unchanged.
For non-nested random effects however (i.e. "crossed"
), group
resampling will not be appropriate, and (semi-)parametric bootstrapping is
performed instead via bootMer
in the lme4 package. Users
should think carefully about whether their random effects are
nested
or not. NOTE: As bootMer
takes only a fitted model as its first
argument, any model averaging is calculated 'post-hoc' using the estimates
in boot objects for each candidate model, rather than during the
bootstrapping process itself (i.e. the default procedure via boot
).
Results are then returned in a new boot object for each response variable
or correlated error estimate.
Parallel processing is used by default via the parallel package and
option parallel = "snow"
(and is generally recommended), but users
can specify the type of parallel processing to use, or none. If
"snow"
, a cluster of workers is created using makeCluster
,
and the user can specify the number of system cores to incorporate in the
cluster (defaults to all available). bootEff
then exports all
required objects and functions to this cluster using clusterExport
,
after performing a (rough) match of all objects and functions in the
current global environment to those referenced in the model call(s). Users
should load any required external packages prior to calling the function.
Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach (2nd ed.). New York: Springer-Verlag. Retrieved from https://www.springer.com/gb/book/9780387953649
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and their Application. Cambridge University Press.
Ren, S., Lai, H., Tong, W., Aminzadeh, M., Hou, X., & Lai, S. (2010). Nonparametric bootstrapping for hierarchical data. Journal of Applied Statistics, 37(9), 1487<U+2013>1498. https://doi.org/dvfzcn
# NOT RUN {
## Bootstrap Shipley SEM (while take a while...)
## Set 'site' as random effect group for resampling - highest-level
# }
# NOT RUN {
system.time(
Shipley.SEM.Boot <- bootEff(Shipley.SEM, ran.eff = "site", seed = 53908,
ncpus = 2)
)
# }
# NOT RUN {
## Original estimates
lapply(Shipley.SEM.Boot, "[[", 1)
## Bootstrapped estimates
lapply(Shipley.SEM.Boot, function(i) head(i$t))
# }
Run the code above in your browser using DataLab