Run the same brms model on multiple datasets and then combine the results into one fitted model object. This is useful in particular for multiple missing value imputation, where the same model is fitted on multiple imputed data sets. Models can be run in parallel using the future package.
brm_multiple(
formula,
data,
family = gaussian(),
prior = NULL,
data2 = NULL,
autocor = NULL,
cov_ranef = NULL,
sample_prior = c("no", "yes", "only"),
sparse = NULL,
knots = NULL,
stanvars = NULL,
stan_funs = NULL,
silent = 1,
recompile = FALSE,
combine = TRUE,
fit = NA,
algorithm = getOption("brms.algorithm", "sampling"),
seed = NA,
file = NULL,
file_compress = TRUE,
file_refit = getOption("brms.file_refit", "never"),
...
)
If combine = TRUE
a brmsfit_multiple
object, which
inherits from class brmsfit
and behaves essentially the same. If
combine = FALSE
a list of brmsfit
objects.
An object of class formula
,
brmsformula
, or mvbrmsformula
(or one that can
be coerced to that classes): A symbolic description of the model to be
fitted. The details of model specification are explained in
brmsformula
.
A list of data.frames each of which will be used to fit a
separate model. Alternatively, a mids
object from the mice
package.
A description of the response distribution and link function to
be used in the model. This can be a family function, a call to a family
function or a character string naming the family. Every family function has
a link
argument allowing to specify the link function to be applied
on the response variable. If not specified, default links are used. For
details of supported families see brmsfamily
. By default, a
linear gaussian
model is applied. In multivariate models,
family
might also be a list of families.
One or more brmsprior
objects created by
set_prior
or related functions and combined using the
c
method or the +
operator. See also default_prior
for more help.
A list of named lists each of which will be used to fit a
separate model. Each of the named lists contains objects representing data
which cannot be passed via argument data
(see brm
for
examples). The length of the outer list should match the length of the list
passed to the data
argument.
(Deprecated) An optional cor_brms
object
describing the correlation structure within the response variable (i.e.,
the 'autocorrelation'). See the documentation of cor_brms
for
a description of the available correlation structures. Defaults to
NULL
, corresponding to no correlations. In multivariate models,
autocor
might also be a list of autocorrelation structures.
It is now recommend to specify autocorrelation terms directly
within formula
. See brmsformula
for more details.
(Deprecated) A list of matrices that are proportional to the
(within) covariance structure of the group-level effects. The names of the
matrices should correspond to columns in data
that are used as
grouping factors. All levels of the grouping factor should appear as
rownames of the corresponding matrix. This argument can be used, among
others to model pedigrees and phylogenetic effects.
It is now recommended to specify those matrices in the formula
interface using the gr
and related functions. See
vignette("brms_phylogenetics")
for more details.
Indicate if draws from priors should be drawn
additionally to the posterior draws. Options are "no"
(the
default), "yes"
, and "only"
. Among others, these draws can
be used to calculate Bayes factors for point hypotheses via
hypothesis
. Please note that improper priors are not sampled,
including the default improper priors used by brm
. See
set_prior
on how to set (proper) priors. Please also note
that prior draws for the overall intercept are not obtained by default
for technical reasons. See brmsformula
how to obtain prior
draws for the intercept. If sample_prior
is set to "only"
,
draws are drawn solely from the priors ignoring the likelihood, which
allows among others to generate draws from the prior predictive
distribution. In this case, all parameters must have proper priors.
(Deprecated) Logical; indicates whether the population-level
design matrices should be treated as sparse (defaults to FALSE
). For
design matrices with many zeros, this can considerably reduce required
memory. Sampling speed is currently not improved or even slightly
decreased. It is now recommended to use the sparse
argument of
brmsformula
and related functions.
Optional list containing user specified knot values to be used
for basis construction of smoothing terms. See
gamm
for more details.
An optional stanvars
object generated by function
stanvar
to define additional variables for use in
Stan's program blocks.
(Deprecated) An optional character string containing
self-defined Stan functions, which will be included in the functions
block of the generated Stan code. It is now recommended to use the
stanvars
argument for this purpose instead.
Verbosity level between 0
and 2
.
If 1
(the default), most of the
informational messages of compiler and sampler are suppressed.
If 2
, even more messages are suppressed. The actual
sampling progress is still printed. Set refresh = 0
to turn this off
as well. If using backend = "rstan"
you can also set
open_progress = FALSE
to prevent opening additional progress bars.
Logical, indicating whether the Stan model should be
recompiled for every imputed data set. Defaults to FALSE
. If
NULL
, brm_multiple
tries to figure out internally, if recompilation
is necessary, for example because data-dependent priors have changed.
Using the default of no recompilation should be fine in most cases.
Logical; Indicates if the fitted models should be combined
into a single fitted model object via combine_models
.
Defaults to TRUE
.
An instance of S3 class brmsfit_multiple
derived from a
previous fit; defaults to NA
. If fit
is of class
brmsfit_multiple
, the compiled model associated with the fitted
result is re-used and all arguments modifying the model code or data are
ignored. It is not recommended to use this argument directly, but to call
the update
method, instead.
Character string naming the estimation approach to use.
Options are "sampling"
for MCMC (the default), "meanfield"
for
variational inference with independent normal distributions,
"fullrank"
for variational inference with a multivariate normal
distribution, "pathfinder"
for the pathfinder algorithm,
"laplace"
for the laplace approximation,
or "fixed_param"
for sampling from fixed parameter
values. Can be set globally for the current R session via the
"brms.algorithm"
option (see options
).
The seed for random number generation to make results
reproducible. If NA
(the default), Stan will set the seed
randomly.
Either NULL
or a character string. In the latter case, the
fitted model object is saved via saveRDS
in a file named
after the string supplied in file
. The .rds
extension is
added automatically. If the file already exists, brm
will load and
return the saved model object instead of refitting the model.
Unless you specify the file_refit
argument as well, the existing
files won't be overwritten, you have to manually remove the file in order
to refit and save the model under an existing file name. The file name
is stored in the brmsfit
object for later usage.
Logical or a character string, specifying one of the
compression algorithms supported by saveRDS
. If the
file
argument is provided, this compression will be used when saving
the fitted model object.
Modifies when the fit stored via the file
argument
is re-used. Can be set globally for the current R session via the
"brms.file_refit"
option (see options
).
For "never"
(default) the fit is always loaded if it
exists and fitting is skipped. For "always"
the model is always
refitted. If set to "on_change"
, brms will
refit the model if model, data or algorithm as passed to Stan differ from
what is stored in the file. This also covers changes in priors,
sample_prior
, stanvars
, covariance structure, etc. If you
believe there was a false positive, you can use
brmsfit_needs_refit
to see why refit is deemed necessary.
Refit will not be triggered for changes in additional parameters of the fit
(e.g., initial values, number of iterations, control arguments, ...). A
known limitation is that a refit will be triggered if within-chain
parallelization is switched on/off.
Further arguments passed to brm
.
The combined model may issue false positive convergence warnings, as the MCMC chains corresponding to different datasets may not necessarily overlap, even if each of the original models did converge. To find out whether each of the original models converged, subset the draws belonging to the individual models and then run convergence diagnostics. See Examples below for details.
if (FALSE) {
library(mice)
m <- 5
imp <- mice(nhanes2, m = m)
# fit the model using mice and lm
fit_imp1 <- with(lm(bmi ~ age + hyp + chl), data = imp)
summary(pool(fit_imp1))
# fit the model using brms
fit_imp2 <- brm_multiple(bmi ~ age + hyp + chl, data = imp, chains = 1)
summary(fit_imp2)
plot(fit_imp2, variable = "^b_", regex = TRUE)
# investigate convergence of the original models
library(posterior)
draws <- as_draws_array(fit_imp2)
# every dataset has just one chain here
draws_per_dat <- lapply(1:m, \(i) subset_draws(draws, chain = i))
lapply(draws_per_dat, summarise_draws, default_convergence_measures())
# use the future package for parallelization
library(future)
plan(multisession, workers = 4)
fit_imp3 <- brm_multiple(bmi ~ age + hyp + chl, data = imp, chains = 1)
summary(fit_imp3)
}
Run the code above in your browser using DataLab