brm(formula, data = NULL, family = "gaussian", prior = NULL,
addition = NULL, autocor = NULL, partial = NULL,
threshold = c("flexible", "equidistant"), cov.ranef = NULL,
ranef = TRUE, sample.prior = FALSE, fit = NA, inits = "random",
n.chains = 2, n.iter = 2000, n.warmup = 500, n.thin = 1,
n.cluster = 1, cluster_type = "PSOCK", silent = TRUE, seed = 12345,
save.model = NULL, ...)as.data.frame to a data frame) containing the variables in the model. If not found in data,
the variables are taken from environment(formula),
typically tgaussianbrmsprior objects created by function
set_prior and combined using the c method.
A single brmsprior object may be passed without formula. See 'Formula Syntax' under 'Details' for further information.~expression specifying the predictors with
category specific effects in non-cumulative ordinal models
(i.e. in families cratio, sratio, or acat)."flexible" provides the standard unstructured thresholds and
"equidistant" restricts the distance between consecutive thresholdsdata that are used as grouping factors.
All levels of the grouping factor shoulTRUE). Set to FALSE to save memory.
The argument has no impact on the model fitting itself.FALSE.brmsfit derived from a previous fit; defaults to NA.
If fit is of class brmsfit, the compiled model associated
with the fitted result is re-used and the arguments form"random" or "0". If inits is "random" (the default),
Stan will randomly generate initial values for parameters.
If it is "0", all parameters are initiliazed to zero.
This option is recommendn.thin > 1 to save memory and computation time if n.iter is large.
Default is 1, that is no thinning.makeCluster when sampling in parallel
(i.e. when n.cluster is greater 1). Default is TRUE, warning messages of the sampler are suppressed.set.seed to make results reproducable.NULL or a character string. In the latter case, the model code is
saved in a file named after the string supplied in save.model,
which may also contain the full path where to save the file.
If only a name is given, thebrmsfit, which contains the posterior samples along
with many other useful information about the model.
Use methods(class = "brmsfit") for an overview on available methods.formula argument accepts formulas of the following syntax:
response | addition ~ fixed + (random | group)
Multiple grouping factors each with multiple random effects are possible.
Instead of | you may use || in random effects terms
to prevent random effects correlations from being modeled.
With the exception of addition, this is basically lme4 syntax.
The optional addition term may contain multiple terms of the form fun(variable)
seperated by | each providing special information on the response variable.
fun can be replaced with either se, weights, trials,
cat, cens, or trunc. Their meanings are explained below.
For families gaussian, student, and cauchy it is possible to specify
standard errors of the observation, thus allowing to perform meta-analysis.
Suppose that the variable yi contains the effect sizes from the studies and sei the
corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted
using the formulae yi | se(sei) ~ 1 and yi | se(sei) ~ 1 + (1|study), respectively, where
study is a variable uniquely identifying every study.
If desired, meta-regressen can be performed via yi | se(sei) ~ 1 + mod1 + mod2 + (1|study)
or
yi | se(sei) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study), where
mod1 and mod2 represent moderator variables.
For all families, weighted regression may be performed using
weights in the addition part. Suppose that variable wei contains the weights
and that yi is the response variable. Then, formula yi | weights(wei) ~ predictors
implements a weighted regression.
For family binomial, addition may contain a variable indicating the number of trials
underlying each observation. In lme4 syntax, we may write for instance
cbind(success, n - success), which is equivalent
to success | trials(n) in brms syntax. If the number of trials
is constant across all observation (say 10), we may also write success | trials(10).
For family categorical and all ordinal families, addition may contain a term cat(number) to
specify the number categories (e.g, cat(7).
If not given, the number of categories is calculated from the data.
With the expection of categorical and ordinal families, left and right censoring
can be modeled through yi | cens(censored) ~ predictors.
The censoring variable (named censored in this example) should
contain the values 'left', 'none', and 'right'
(or equivalenty -1, 0, and 1) to indicate that the corresponding observation is
left censored, not censored, or right censored.
With the expection of categorical and ordinal families, the response
distribution can be truncated using the trunc function in the addition part.
If the response variable is truncated between, say, 0 and 100, we can specify this via
yi | trunc(lb = 0, ub = 100) ~ predictors. Defining only one of the two arguments
in trunc leads to one-sided truncation.
Mutiple addition terms may be specified at the same time, for instance
formula = yi | se(sei) | cens(censored) ~ 1 for a censored meta-analytic model.
For families gaussian, student, and cauchy
multivariate models may be specified using cbind notation.
Suppose that y1 and y2 are response variables
and x is a predictor.
Then cbind(y1,y2) ~ x specifies a multivariate model,
where x has the same effect on y1 and y2.
To indicate different effects on each response variable,
the variable trait (which is reserved in multivariate models)
can be used as an additional categorical predictor.
For instance, cbind(y1,y2) ~ 0 + x:trait leads to seperate effects
of x on y1 and y2.
In this case, trait has two levels, namely "y1" and "y2".
By default, trait is dummy-coded.
It may also be used within random effects terms, both as grouping factor or
as random effect within a grouping factor. Note that variable trait is generated
internally and may not be specified in the data passed to brm.
Zero-inflated and hurdle families are bivariate and also make use of the special internal
variable trait having two levels in this case.
However, only the actual response must be specified in formula,
as the second response variable used for the zero-inflation / hurdle part
is internally generated.
A formula for this type of models may, for instance, look like this:
y ~ 0 + trait * (x1 + x2) + (0 + trait | g). In this example, the fixed effects
x1 and x1 influence the zero-inflation / hurdle part differently
than the actual response part as indicated by their interaction with trait.
In addition, a random effect of trait was added while the random intercept
was removed leading to the estimation of two random effects,
one for the zero-inflation / hurdle part and one for the actual response.
In the example above, the correlation between the two random effects will also be estimated.
Families and link functions
Family gaussian with identity link leads to linear regression.
Families student, and cauchy with identity link leads to
robust linear regression that is less influenced by outliers.
Families poisson, negbinomial, and geometric with log link lead to
regression models for count data.
Families binomial and bernoulli with logit link leads to
logistic regression and family categorical to multi-logistic regression
when there are more than two possible outcomes.
Families cumulative, cratio ('contiuation ratio'), sratio ('stopping ratio'),
and acat ('adjacent category') leads to ordinal regression. Families gamma,
weibull, exponential, and inverse.gaussian can be used (among others)
for survival regression when combined with the log link.
Families hurdle_poisson, hurdle_negbinomial, hurdle_gamma,
zero_inflated_poisson, and
zero_inflated_negbinomial combined with the
log link allow to estimate zero-inflated and hurdle models. These models
can be very helpful when there are many zeros in the data that cannot be explained
by the primary distribution of the response. Family hurdle_gamma is
especially useful, as a traditional gamma model cannot be reasonably fitted for
data containing zeros in the response.
In the following, we list all possible links for each family.
The families gaussian, student, and cauchy accept the links (as names)
identity, log, and inverse;
families poisson, negbinomial, and geometric the links
log, identity, and sqrt;
families binomial, bernoulli, cumulative, cratio, sratio,
and acat the links logit, probit, probit_approx,
cloglog, and cauchit;
family categorical the link logit; families gamma, weibull,
and exponential the links log, identity, and inverse;
family inverse.gaussian the links 1/mu^2, inverse, identity
and log; families hurdle_poisson, hurdle_negbinomial,
hurdle_gamma, zero_inflated_poisson, and
zero_inflated_negbinomial the link log.
The first link mentioned for each family is the default.
Please note that when calling the Gamma family function,
the default link will be inverse not log.
Also, the probit_approx link cannot be used when calling the
binomial family function.
The current implementation of inverse.gaussian models has some
convergence problems and requires carefully chosen prior distributions
to work efficiently. For this reason, we currently do not recommend
to use the inverse.gaussian family, unless you really feel
that your data requires exactly this type of model.
Prior distributions
As of set_prior function.
Its documentation contains detailed information on how to correctly specify priors.
To find out on which parameters or parameter classes priors can be defined,
use get_prior.
Adjusting the sampling behavior of control argument (a named list),
which is passed directly to brm.
The most important reason to use control is to decrease
(or eliminate at best) the number of divergent transitions
that cause a bias in the obtained posterior samples.
Whenever you see the warning
"There were x divergent transitions after warmup.
Increasing adapt_delta may help."
you should really think about increasing adapt_delta.
To do this, write control = list(adapt_delta = ) , where
should usually be value between 0.95 (default) and 1.
Increasing adapt_delta will slow down the sampler but will
decrease the number of divergent transitions threatening
the validity of your posterior samples.
For more details on the control argument see
stan.## Poisson Regression for the number of seizures in epileptic patients
## using student_t priors for fixed effects
## and half cauchy priors for standard deviations of random effects
fit_e <- brm(count ~ log_Age_c + log_Base4_c * Trt_c + (1|patient) + (1|visit),
data = epilepsy, family = "poisson",
prior = c(set_prior("student_t(5,0,10)", class = "b"),
set_prior("cauchy(0,2)", class = "sd")))
## generate a summary of the results
summary(fit_e)
## plot the MCMC chains as well as the posterior distributions
plot(fit_e)
## extract random effects standard devations, correlation and covariance matrices
VarCorr(fit_e)
## extract random effects for each level
ranef(fit_e)
## Ordinal regression modeling patient's rating
## of inhaler instructions using normal priors for fixed effects parameters
fit_i <- brm(rating ~ treat + period + carry, data = inhaler,
family = sratio("cloglog"), prior = set_prior("normal(0,5)"))
summary(fit_i)
plot(fit_i)
## Surivival Regression (with family 'weibull') modeling time between
## first and second recurrence of an infection in kidney patients
## time | cens indicates which values in variable time are right censored
fit_k <- brm(time | cens(censored) ~ age + sex + disease + (1|patient),
data = kidney, family = "weibull", inits = "0")
summary(fit_k)
plot(fit_k)
## Logisitic Regression (with family 'binomial') to illustrate model specification
## variable n contains the number of trials, success contains the number of successes
n <- sample(1:10, 100, TRUE)
success <- rbinom(100, size = n, prob = 0.4)
x <- rnorm(100)
fit_b <- brm(success | trials(n) ~ x, family = binomial("probit"))
summary(fit_b)Run the code above in your browser using DataLab