brm(formula, data = NULL, family = c("gaussian", "identity"),
prior = list(), addition = NULL, autocor = NULL, partial = NULL,
threshold = "flexible", cov.ranef = NULL, ranef = TRUE, WAIC = FALSE,
predict = FALSE, fit = NA, n.chains = 2, n.iter = 2000,
n.warmup = 500, n.thin = 1, n.cluster = 1, inits = "random",
silent = FALSE, seed = 12345, save.model = NULL, ...)
as.data.frame
to a data frame) containing
the variables in the model. If not found in data, the variables are taken from environment(formula)
,
typically the e"gaussian"
, "student"
, "cauchy"
, se
for specifying standard errors for meta-analysis, weights
to fit weighted regression models,~partial.effects
specifing the predictors that can vary between categories in non-cumulative ordinal models
(i.e. in families "cratio"
, "sratio"
, or "acat"
)."flexible"
provides the standard unstructured thresholds and "equidistant"
restricts the distance between consecutive thresholds todata
that are used as grouping factors.
All levels of the grouping factor should TRUE
).
Set to FALSE
to save memory. The argument has no impact on the model fitting itself.brmsfit
derived from a previous fit; defaults to NA
. If fit
is of class brmsfit
, the compiled model associated
with the fitted result is re-used and the arguments formula<
n.
n.thin > 1
to save memory and computation time if n.iter
is large. Default is 1, that is no thinning.n.chains
elements; each element of the list is itself a list of starting values for the model, or a function creating (possibly random) initial values.
If inits is "random"
(the default), Stan will generate initial vTRUE
, most intermediate output from Stan is suppressed.set.seed
to make results reproducable.NULL
or a character string. In the latter case, the model code is
saved in a file named after the string supplied in save.model
, which may also contain the full path where to save the file.
If only a name is given, the fbrmsfit
, which contains the posterior samples along with many other useful information about the model.
If rstan is not installed, brmsfit
will not contain posterior samples."the current Metropolis proposal is about the be rejected ..."
.
These messages can be ignored in nearly all cases.
Use silent = TRUE
to stop these messages from being printed out.
Formula syntax
The formula
argument accepts formulas of the following syntax:
response | addition ~ fixed + (random | group)
Multiple grouping factors each with multiple random effects are possible. Instead of | you may use || in random effects terms
to prevent random effects correlations from being modeled.
With the exception of addition
, this is basically lme4
syntax.
The optional addition
term may contain multiple terms of the form fun(variable)
seperated by |
each providing
special information on the response variable. fun
can be replaced with either se
, weights
, trials
,
cat
, or cens
(their meanings are explained below). Using the addition
term in formula
is equivalent
to using argument addition
: Instead of writing fun(variable)
in formula
, we may use addition = list(fun = ~variable)
.
For families gaussian
, student
, and cauchy
it is possible to specify
standard errors of the observation, thus allowing to perform meta-analysis.
Suppose that the variable yi
contains the effect sizes from the studies and sei
the
corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted
using the formulae yi | se(sei) ~ 1
and yi | se(sei) ~ 1 + (1|study)
, respectively, where
study
is a variable uniquely identifying every study.
If desired, meta-regressen can be performed via yi | se(sei) ~ 1 + mod1 + mod2 + (1|study)
or
yi | se(sei) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study)
, where
mod1
and mod2
represent moderator variables.
For all families, weighted regression may be performed using
weights
in the addition part. Suppose that variable wei
contains the weights and that yi
is the response variable.
Then, formula yi | weights(wei) ~ predictors
implements a weighted regression.
For family binomial
, addition may contain a variable indicating the number of trials
underlying each observation. In lme4
syntax, we may write for instance
cbind(success, n - success)
, which is equivalent
to success | trials(n)
in brms
syntax. If the number of trials
is constant across all observation (say 10
), we may also write success | trials(10)
.
For family categorical
and all ordinal families, addition
may contain a term cat(categories)
to
specify the number categories for each observation, either with a variable name (e.g, categories
in this example) or a single number.
With the expection of categorical
and ordinal families, left and right censoring can be modeled through yi | cens(censored) ~ predictors
.
The censoring variable (named censored
in this example) should contain the values 'left'
, 'none'
, and 'right'
(or equivalenty -1, 0, and 1) to indicate that the corresponding observation is left censored, not censored, or right censored.
Mutiple addition
terms may be specified at the same time, for instance
formula = yi | se(sei) | cens(censored) ~ 1
for a censored meta-analytic model, equivalent to
formula = yi ~ 1
and addition = list(se = ~sei, cens = ~censored)
when using argument addition
.
Family multigaussian
allows to perform multivariate (normal) regression using cbind
notation. Suppose that
y1
and y2
are response variables and x
is a predictor, then cbind(y1,y2) ~ x
speficies a multivariate model, where
x
has the same effect on y1
and y2
.
To indicate different effects on each response variable, the word trait
(which is reserved in multigaussian
models) can be used
as an additional categorical predictor. For instance, cbind(y1,y2) ~ -1 + x:trait
leads to seperate effects
of x
on y1
and y2
. In this case, trait
has two levels, namely "y1"
and "y2"
.
By default, trait
is dummy-coded.
It may also be used within random effects terms, both as grouping factor or as random effect within a grouping factor.
Note that variable trait
is generated internally and may not be specified in the data passed to brm
.
Families and link functions
Family gaussian
with identity
link leads to linear regression. Families student
, and cauchy
with identity
link leads to robust linear regression that is less influenced by outliers. Family multigaussian
with link
identity
may be used for multivariate normal regression as explained above.
Families poisson
, negbinomial
, and geometric
with log
link lead to regression models for count data.
Families binomial
and bernoulli
with logit
link leads to logistic regression and family categorical
to
multi-logistic regression when there are more than two possible outcomes.
Families cumulative
, cratio
('contiuation ratio'), sratio
('stopping ratio'),
and acat
('adjacent category') leads to ordinal regression. Families gamma
, weibull
, and exponential
can be used (among others) for survival regression when combined with the log
link.
In the following, we list all possible links for each family.
The families gaussian
, student
, cauchy
, and multigaussian
accept the links (as names) identity
, log
, and inverse
;
families poisson
, negbinomial
, and geometric
the links log
, identity
, and sqrt
;
families binomial
, bernoulli
, cumulative
, cratio
, sratio
, and acat
the links logit
, probit
, probit_approx
, and cloglog
;
family categorical
the link logit
; families gamma
, weibull
, and exponential
the links log
, identity
, and inverse
.
The first link mentioned for each family is the default.
Prior distributions
Below, we describe the usage of the prior
argument and list some common prior distributions
for parameters in brms
models.
A complete overview on possible prior distributions is given in the Stan Reference Manual available at
brm
performs no checks if the priors are written in correct Stan language.
Instead, Stan will check their correctness when the model is parsed to C++ and returns an error if they are not.
Currently, there are five types of parameters in brms
models,
for which the user can specify prior distributions.
1. Fixed effects
Every fixed (and partial) effect has its corresponding regression parameter. These parameters are named as
b_(fixed)
, where (fixed)
represents the name of the corresponding fixed effect.
Suppose, for instance, that y
is predicted by x1
and x2
(i.e. y ~ x1+x2
in formula syntax).
Then, x1
and x2
have regression parameters b_x1
and b_x2
respectively.
The default prior for fixed effects parameters is an improper flat prior over the reals.
Other common options are normal priors or uniform priors over a finite interval.
If we want to have a normal prior with mean 0 and standard deviation 5 for b_x1
,
and a uniform prior between -10 and 10 for b_x2
,
we can specify this via
prior = list(b_x1 = "normal(0,5)", b_x2 = "uniform(-10,10)")
.
To put the same prior (e.g. a normal prior) on all fixed effects at once,
we may write as a shortcut prior =
list(b = "normal(0,5)")
. In addition, this
leads to faster sampling in Stan, because priors can be vectorized.
2. Autocorrelation parameters
The autocorrelation parameters currently implemented are named ar
(autoregression) and ma
(moving average).
The default prior for autocorrelation parameters is an improper flat prior over the reals. It should be noted that ar
will
only take one values between -1 and 1 if the response variable is wide-sence stationay, i.e. if there is no drift in the responses.
3. Standard deviations of random effects
Each random effect of each grouping factor has a standard deviation named
sd_(group)_(random)
. Consider, for instance, the formula y ~ x1+x2+(1+x1|z)
.
We see that the intercept as well as x1
are random effects nested in the grouping factor z
.
The corresponding standard deviation parameters are named as sd_z_Intercept
and sd_z_x1
respectively.
These parameters are restriced to be non-negative and, by default,
have a half cauchy prior with 'mean' 0 and 'standard deviation' 5.
We could make this explicit by writing prior = list(sd = "cauchy(0,5)")
.
One common alternative is a uniform prior over a positive interval.
4. Correlations of random effects
If there is more than one random effect per grouping factor, the correlations between those random
effects have to be estimated.
However, in brms
models, the corresponding correlation matrix $C$ does not have prior itself.
Instead, a prior is defined for the cholesky factor $L$ of $C$. They are related through the equation
$$L * L' = C.$$
The prior "lkj_corr_cholesky(eta)"
with eta > 0
is essentially the only prior for
cholesky factors of correlation matrices.
If eta = 1
(the default) all correlations matrices are equally likely a priori. If eta > 1
,
extreme correlations become less likely,
whereas 0 < eta < 1
results in higher probabilities for extreme correlations.
The cholesky factors in brms
models are named as
L_(group)
, (e.g., L_z
if z
is the grouping factor).
5. Parameters for specific families
Some families need additional parameters to be estimated.
Families gaussian
, student
, and cauchy
need the parameter sigma
to account for the standard deviation of the response variable around the regression line
(not to be confused with the standard deviations of random effects).
By default, sigma
has an improper flat prior over the positiv reals.
Furthermore, family student
needs the parameter nu
representing
the degrees of freedom of students t distribution.
By default, nu
has prior "uniform(1,60)"
.
Families gamma
and weibull
need the parameter shape
that has a "gamma(0.01,0.01)"
prior by default. For families cumulative
, cratio
, sratio
,
and acat
, and only if threshold = "equidistant"
, the parameter delta
is used to model the distance
between to adjacent thresholds. By default, delta
has an improper flat prior over the reals.## Poisson Regression for the number of seizures in epileptic patients
## using half cauchy priors for standard deviations of random effects
fit_e <- brm(count ~ log_Age_c + log_Base4_c * Trt_c + (1|patient) + (1|visit),
data = epilepsy, family = "poisson", prior = list(sd = "cauchy(0,2.5)"))
## generate a summary of the results
summary(fit_e)
## plot the MCMC chains as well as the posterior distributions
plot(fit_e)
## extract random effects standard devations, correlation and covariance matrices
VarCorr(fit_e)
## extract random effects for each level
ranef(fit_e)
## Ordinal regression (with family 'sratio') modeling patient's rating
## of inhaler instructions using normal priors for fixed effects parameters
fit_i <- brm(rating ~ treat + period + carry, data = inhaler,
family = "sratio", prior = list(b = "normal(0,5)"))
summary(fit_i)
plot(fit_i)
## Surivival Regression (with family 'weibull') modeling time between
## first and second recurrence of an infection in kidney patients
## time | cens indicates which values in variable time are right censored
fit_k <- brm(time | cens(censored) ~ age + sex + disease, data = kidney,
family = "weibull", silent = TRUE, inits = "0")
summary(fit_k)
plot(fit_k)
Run the code above in your browser using DataLab