brm(formula, data = NULL, family = gaussian(), prior = NULL,
autocor = NULL, nonlinear = NULL, partial = NULL,
threshold = c("flexible", "equidistant"), cov_ranef = NULL,
ranef = TRUE, sparse = FALSE, sample_prior = FALSE, stan_funs = NULL,
fit = NA, inits = "random", chains = 4, iter = 2000,
warmup = floor(iter/2), thin = 1, cluster = 1, cluster_type = "PSOCK",
control = NULL, algorithm = c("sampling", "meanfield", "fullrank"),
silent = TRUE, seed = 12345, save_model = NULL, ...)
as.data.frame
to a data frame) containing the variables in the model.
If not found in data, the variables are taken from environment(formula)
,
typically tgaussian
brmsprior
objects created by function
set_prior
and combined using the c
method.
A single brmsprior
object may be passed without
NULL
(the default)
formula
is treated as an ordinary formula.
If not NULL
, formula
is treated as a non-linear mo~expression
allowing to specify predictors with
category specific effects in non-cumulative ordinal models
(i.e. in families cratio
, sratio
, or acat
).
"flexible"
provides the standard unstructured thresholds and
"equidistant"
restricts the distance between
consecutive threshdata
that are used as grouping factors.
All levels of the grouping factor sTRUE
).
Set to FALSE
to save memory.
The argument has no impact on the model fitting itself.FALSE
).
For design matrices with many zeros, this can considerably
reduce required memory. For all models using multivariate syntax
(i.FALSE
). Among others, these samples can be used
to calculate Bayes factors for point hypotheses.
Alternatbrmsfit
derived from a previous fit;
defaults to NA
.
If fit
is of class brmsfit
, the compiled model associated
with the fitted result is re-used and all arguments
modifyin"random"
or "0"
.
If inits is "random"
(the default),
Stan will randomly generate initial values for parameters.
If it is "0"
, all parameters are initiliazed to zero.
This option is recommen.chains
.n.iter
.thin > 1
to save memory and computation time if iter
is large.
Default is 1, that is no thinning. A deprecated alias is n.thin
.n.cluster
. To use the built-in parallel execution
of cores
instead of cluster
.makeCluster
when sampling in parallel
(i.e. when cluster
is greater 1
).
Default is
list
of parameters to control the sampler's behavior.
It defaults to NULL
so all the default values are used.
The most important control parameters are discussed in the 'Details'
section below. For a comprehensive ov"sampling"
for MCMC (the default), "meanfield"
for
variational inference with independent normal distributions, or
"fullrank"
for variational infTRUE
, warning messages of the sampler are suppressed.set.seed
to make results reproducable.NULL
or a character string.
In the latter case, the model code is
saved in a file named after the string supplied in save_model
,
which may also contain the full path where to save the file.
If only a name is given, tbrmsfit
, which contains the posterior samples along
with many other useful information about the model.
Use methods(class = "brmsfit")
for an overview on available methods.formula
argument accepts formulas of the following syntax:
response | addition ~ fixed + (random | group)
The fixed
term contains effects that are assumend to be the
same across obervations. We call them 'population-level' effects
or (adopting frequentist vocabulary) 'fixed' effects. The optional
random
terms may contain effects that are assumed to vary
accross grouping variables specified in group
. We
call them 'group-level' effects or (adopting frequentist
vocabulary) 'random' effects, although the latter name is misleading
in a Bayesian context (for more details type vignette("brms")
).
Multiple grouping factors each with multiple group-level effects
are possible. Instead of | you may use || in grouping terms
to prevent correlations from being modeled. With two exceptions,
this is basically lme4
syntax.
The first exception is that fixed
may contain two non-standard types
of population-level effects namely monotonous and category specific effects,
which can be specified using terms of the form monotonous()
and cse()
respectively. The latter can only be applied in
ordinal models and is explained in more detail in the package's vignette
(type vignette("brms")
). The former effect type is explained here.
A monotonous predictor must either be integer valued or an ordered factor,
which is the first difference to an ordinary continuous predictor.
More importantly, predictor categories (or integers) are not assumend to be
equidistant with respect to their effect on the response variable.
Instead, the distance between adjacent predictor categories (or integers)
is estimated from the data and may vary across categories.
This is realized by parameterizing as follows:
One parameter takes care of the direction and size of the effect similar
to an ordinary regression parameter, while an additional parameter vector
estimates the normalized distances between consecutive predictor categories.
A main application of monotonous effects are ordinal predictors that
can this way be modeled without (falsely) treating them as continuous
or as unordered categorical predictors.
The second eception is the optional addition
term, which may contain
multiple terms of the form fun(variable)
seperated by +
each
providing special information on the response variable.
fun
can be replaced with either se
, weights
, disp
, trials
,
cat
, cens
, or trunc
. Their meanings are explained below.
For families gaussian
, student
, and cauchy
it is possible to specify
standard errors of the observation, thus allowing to perform meta-analysis.
Suppose that the variable yi
contains the effect sizes from the studies
and sei
the corresponding standard errors.
Then, fixed and random effects meta-analyses can be conducted
using the formulae yi | se(sei) ~ 1
and
yi | se(sei) ~ 1 + (1|study)
, respectively, where
study
is a variable uniquely identifying every study.
If desired, meta-regression can be performed via
yi | se(sei) ~ 1 + mod1 + mod2 + (1|study)
or
yi | se(sei) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study)
, where
mod1
and mod2
represent moderator variables.
For all families, weighted regression may be performed using
weights
in the addition part. Internally, this is
implemented by multiplying the log-posterior values of each
observation by their corresponding weights.
Suppose that variable wei
contains the weights
and that yi
is the response variable.
Then, formula yi | weights(wei) ~ predictors
implements a weighted regression.
The addition argument disp
(short for dispersion) serves a
similar purpose than weight
. However, it has a different
implementation and is less general as it is only usable for the
families gaussian
, student
, cauchy
,
Gamma
, weibull
, and negbinomial
.
For the former three families, the residual standard deviation
sigma
is multiplied by the values given in
disp
, so that higher values lead to lower weights.
Contrariwise, for the latter three families, the parameter shape
is multiplied by the values given in disp
. As shape
can be understood as a precision parameter (inverse of the variance),
higher values will lead to higher weights in this case.
For families binomial
and zero_inflated_binomial
,
addition should contain a variable indicating the number of trials
underlying each observation. In lme4
syntax, we may write for instance
cbind(success, n - success)
, which is equivalent
to success | trials(n)
in brms
syntax. If the number of trials
is constant across all observation (say 10
),
we may also write success | trials(10)
.
For family categorical
and all ordinal families,
addition
may contain a term cat(number)
to
specify the number categories (e.g, cat(7)
).
If not given, the number of categories is calculated from the data.
With the expection of categorical
and ordinal families, left and right censoring
can be modeled through yi | cens(censored) ~ predictors
.
The censoring variable (named censored
in this example) should
contain the values 'left'
, 'none'
, and 'right'
(or equivalenty -1, 0, and 1) to indicate that the corresponding observation is
left censored, not censored, or right censored.
With the expection of categorical
and ordinal families, the response
distribution can be truncated using the trunc
function in the addition part.
If the response variable is truncated between, say, 0 and 100, we can specify this via
yi | trunc(lb = 0, ub = 100) ~ predictors
. Defining only one of the two arguments
in trunc
leads to one-sided truncation.
Mutiple addition
terms may be specified at the same time using
the +
operator, for instance formula = yi | se(sei) + cens(censored) ~ 1
for a censored meta-analytic model.
For families gaussian
, student
, and cauchy
multivariate models may be specified using cbind
notation.
Suppose that y1
and y2
are response variables
and x
is a predictor.
Then cbind(y1,y2) ~ x
specifies a multivariate model,
where x
has the same effect on y1
and y2
.
To indicate different effects on each response variable,
the factor trait
(which is reserved in multivariate models)
can be used as an additional predictor.
For instance, cbind(y1,y2) ~ 0 + trait + x:trait
leads to seperate effects
of x
on y1
and y2
as well as to separate intercepts.
In this case, trait
has two levels, namely "y1"
and "y2"
.
It may also be used within random effects terms, both as grouping factor or
as random effect within a grouping factor. Note that variable trait
is generated
internally and may not be specified in the data passed to brm
.
As of trait
differentiates between the response
categories. As in most other implementations of categorical models,
values of one category are fixed to identify the model.
Accordingly, trait
has K - 1
levels,
where K
is the number of categories.
Usually, it makes most sense to use terms of the structure
0 + trait + trait:()
.
Zero-inflated and hurdle families are bivariate and also make use
of the special internal variable trait
having two levels in this case.
However, only the actual response must be specified in formula
,
as the second response variable used for the zero-inflation / hurdle
(ZIH) part is internally generated.
A formula
for this type of models may, for instance, look like this:
y ~ 0 + trait + trait:(x1 + x2) + (0 + trait | g)
.
In this example, the predictors x1
and x1
influence the ZIH part
differentlythan the actual response part as indicated by their
interaction with trait
.
In addition, a group-level effect of trait
was added while
the group-level intercept was removed leading to the estimation of
two effects, one for the ZIH part and one for the actual response.
In the example above, the correlation between the two effects
will also be estimated.
Sometimes, predictors should only influence the ZIH part
but not the actual response (or vice versa). As this cannot be modeled
with the trait
variable, two other internally generated and
reserved (numeric) variables, namely main
and spec
, are supported.
main
is 1
for the response part and 0
for the
ZIH part of the model. For spec
it is the other way round.
Suppose that x1
should only influence the actual response,
and x2
only the ZIH process. We can write this as follows:
formula = y ~ 0 + main + spec + main:x1 + spec:x2
.
The main effects of main
or spec
serve as intercepts,
while the interaction terms main:x1
and spec:x2
ensure
that x1
and x2
only predict one part of the model, respectively.
Using the same syntax as for zero-inflated and hurdle models, it is
possible to specify multiplicative effects in family bernoulli
(make sure to set argument type
to "2PL"
;
see brmsfamily
for more details).
In Item Response Theory (IRT), these models are known as 2PL models.
Suppose that we have the variables item
and person
and
want to model fixed effects for items and random effects for persons.
The discriminality (multiplicative effect) should depend only on the items.
We can specify this by setting
formula = response ~ 0 + (main + spec):item + (0 + main|person)
.
The random term 0 + main
ensures that person
does
not influence discriminalities. Of course it is possible
to predict only discriminalities by using
variable spec
in the model formulation.
To identify the model, multiplicative effects are estimated on the log scale.
In addition, we strongly recommend setting proper priors
on fixed effects in this case to increase sampling efficiency
(for details on priors see set_prior
).
Parameterization of the population-level intercept
The population-level intercept (if incorporated) is estimated separately
and not as part of population-level parameter vector b
.
This has the side effect that priors on the intercept
also have to be specified separately
(see set_prior
for more details).
Furthermore, to increase sampling efficiency, the fixed effects
design matrix X
is centered around its column means
X_means
if the intercept is incorporated.
This leads to a temporary bias in the intercept equal to
, where <,>,>
is the scalar product.
The bias is corrected after fitting the model, but be aware
that you are effectively defining a prior on the temporary
intercept of the centered design matrix not on the real intercept.
This behavior can be avoided by using the reserved
(and internally generated) variable intercept
.
Instead of y ~ x
, you may write
y ~ -1 + intercept + x
. This way, priors can be
defined on the real intercept, directly. In addition,
the intercept is just treated as an ordinary fixed effect
and thus priors defined on b
will also apply to it.
Note that this parameterization may be a bit less efficient
than the default parameterization discussed above.
Formula syntax for non-linear multilevel models
Using the nonlinear
argument, it is possible to specify
non-linear models in nonlinear
should not contain the non-linear model itself
but rather information on the non-linear parameters.
The non-linear model will just be specified within the formula
argument. Suppose, that we want to predict the response y
through the predictor x
, where x
is linked to y
through y = alpha - beta * lambda^x
, with parameters
alpha
, beta
, and lambda
. This is certainly a
non-linear model being defined via
formula = y ~ alpha - beta * lambda^x
(addition arguments
can be added in the same way as for ordinary formulas).
Now we have to tell nonlinear
argument. Let's say we just want to estimate those three parameters
with no further covariates or random effects. Then we can write
nonlinear = alpha + beta + lambda ~ 1
or equivalently
(and more flexible) nonlinear = list(alpha ~ 1, beta ~ 1, lambda ~ 1)
.
This can, of course, be extended. If we have another predictor z
and
observations nested within the grouping factor g
, we may write for
instance
nonlinear = list(alpha ~ 1, beta ~ 1 + z + (1|g), lambda ~ 1)
.
The formula syntax described above applies here as well.
In this example, we are using z
and g
only for the
prediction of beta
, but we might also use them for the other
non-linear parameters (provided that the resulting model is still
scientifically reasonable).
Non-linear models may not be uniquely identified and / or show bad convergence.
For this reason it is mandatory to specify priors on the non-linear parameters.
For instructions on how to do that, see set_prior
.
Families and link functions
Family gaussian
with identity
link leads to linear regression.
Families student
, and cauchy
with identity
link leads to
robust linear regression that is less influenced by outliers.
Families poisson
, negbinomial
, and geometric
with log
link lead to regression models for count data.
Families binomial
and bernoulli
with logit
link leads to
logistic regression and family categorical
to multi-logistic regression
when there are more than two possible outcomes.
Families cumulative
, cratio
('contiuation ratio'),
sratio
('stopping ratio'), and acat
('adjacent category')
leads to ordinal regression. Families Gamma
, weibull
,
exponential
, and inverse.gaussian
can be used (among others)
for survival regression when combined with the log
link.
Families hurdle_poisson
, hurdle_negbinomial
, hurdle_gamma
,
zero_inflated_poisson
, and
zero_inflated_negbinomial
combined with the
log
link, and zero_inflated_binomial
with the logit
link,
allow to estimate zero-inflated and hurdle models. These models
can be very helpful when there are many zeros in the data that cannot be explained
by the primary distribution of the response. Family hurdle_gamma
is
especially useful, as a traditional Gamma
model cannot be reasonably
fitted for data containing zeros in the response.
In the following, we list all possible links for each family.
The families gaussian
, student
, and cauchy
accept the links (as names) identity
, log
, and inverse
;
families poisson
, negbinomial
, and geometric
the links
log
, identity
, and sqrt
;
families binomial
, bernoulli
, Beta
,
cumulative
, cratio
, sratio
, and acat
the links logit
, probit
, probit_approx
,
cloglog
, and cauchit
;
family categorical
the link logit
;
families Gamma
, weibull
, and exponential
the links log
, identity
, and inverse
;
family inverse.gaussian
the links 1/mu^2
,
inverse
, identity
and log
;
families hurdle_poisson
, hurdle_negbinomial
,
hurdle_gamma
, zero_inflated_poisson
, and
zero_inflated_negbinomial
the link log
.
The first link mentioned for each family is the default.
Please note that when calling the Gamma
family function, the default link will be inverse
not log
.
Also, the probit_approx
link cannot be used when calling the
binomial
family function.
The current implementation of inverse.gaussian
models has some
convergence problems and requires carefully chosen prior distributions
to work efficiently. For this reason, we currently do not recommend
to use the inverse.gaussian
family, unless you really feel
that your data requires exactly this type of model.
Prior distributions
As of set_prior
function.
Its documentation contains detailed information
on how to correctly specify priors. To find out on
which parameters or parameter classes priors can be defined,
use get_prior
.
Adjusting the sampling behavior of control
argument.
The most important reason to use control
is to decrease
(or eliminate at best) the number of divergent transitions
that cause a bias in the obtained posterior samples.
Whenever you see the warning
"There were x divergent transitions after warmup."
you should really think about increasing adapt_delta
.
To do this, write control = list(adapt_delta = )
,
where
should usually be value between 0.8
(current default) and 1
. Increasing adapt_delta
will slow down the sampler but will decrease the number of
divergent transitions threatening the validity of your
posterior samples.
Another problem arises when the depth of the tree being evaluated
in each iteration is exceeded. This is less common than having
divergent transitions, but may also bias the posterior samples.
When it happens, max_treedepth
, which can be accomplished by
writing control = list(max_treedepth = )
with a positive
integer
that should usually be larger than the current
default of 10
. For more details on the control
argument
see stan
.## Poisson regression for the number of seizures in epileptic patients
## using student_t priors for population-level effects
## and half cauchy priors for standard deviations of group-level effects
fit1 <- brm(count ~ log_Age_c + log_Base4_c * Trt_c
+ (1|patient) + (1|visit) + (1|obs),
data = epilepsy, family = poisson(),
prior = c(set_prior("student_t(5,0,10)", class = "b"),
set_prior("cauchy(0,2)", class = "sd")))
## generate a summary of the results
summary(fit1)
## plot the MCMC chains as well as the posterior distributions
plot(fit1, ask = FALSE)
## extract random effects standard devations and covariance matrices
VarCorr(fit1)
## extract group specific effects of each level
ranef(fit1)
## predict responses based on the fitted model
head(predict(fit1))
## plot marginal effects of each predictor
plot(marginal_effects(fit1), ask = FALSE)
## Ordinal regression modeling patient's rating of inhaler instructions
## category specific effects are estimated for variable 'treat'
fit2 <- brm(rating ~ period + carry + cse(treat),
data = inhaler, family = sratio("cloglog"),
prior = set_prior("normal(0,5)"), chains = 2)
summary(fit2)
plot(fit2, ask = FALSE)
## Survival regression modeling the time between the first
## and second recurrence of an infection in kidney patients.
fit3 <- brm(time | cens(censored) ~ age * sex + disease + (1|patient),
data = kidney, family = gaussian("log"))
summary(fit3)
plot(fit3, ask = FALSE)
plot(marginal_effects(fit3), ask = FALSE)
## Probit regression using the binomial family
n <- sample(1:10, 100, TRUE) # number of trials
success <- rbinom(100, size = n, prob = 0.4)
x <- rnorm(100)
fit4 <- brm(success | trials(n) ~ x,
family = binomial("probit"))
summary(fit4)
## Simple non-linear gaussian model
x <- rnorm(100)
y <- rnorm(100, mean = 2 - 1.5^x, sd = 1)
fit5 <- brm(y ~ a1 - a2^x, nonlinear = a1 + a2 ~ 1,
prior = c(set_prior("normal(0, 2)", nlpar = "a1"),
set_prior("normal(0, 2)", nlpar = "a2")))
summary(fit5)
plot(marginal_effects(fit5), ask = FALSE)
Run the code above in your browser using DataLab