set_prior(prior, class = "b", coef = "", group = "", nlpar = "", resp = NULL, lb = NULL, ub = NULL)
prior_string(prior, ...)
prior(prior, ...)
"b"
(fixed effects).
See 'Details' for other valid parameter classes.nlpar
."b"
, "ar"
, "ma"
, and "arr"
.
Defaults to NULL
, that is no restriction."b"
, "ar"
, "ma"
, and "arr"
.
Defaults to NULL
, that is no restriction.set_prior
.brmsprior
to be used in the prior
argument of brm
.
prior_string
: Alias of set_prior
. prior
: Alias of set_prior
allowing to specify
arguments without quotes ""
using non-standard evaluation.
set_prior
is used to define prior distributions for parameters
in brms models. The functions prior
and prior_string
are both aliases of set_prior
, the former allowing to specify
arguments without quotes ""
using non-standard evaluation.
Below, we explain its usage and list some common
prior distributions for parameters.
A complete overview on possible prior distributions is given
in the Stan Reference Manual available at http://mc-stan.org/.
To combine multiple priors, use c(...)
,
e.g., c(set_prior(...), set_prior(...))
.
brms does not check if the priors are written in correct Stan language.
Instead, Stan will check their syntactical correctness when the model
is parsed to C++
and returns an error if they are not.
This, however, does not imply that priors are always meaningful if they are
accepted by Stan. Although brms trys to find common problems
(e.g., setting bounded priors on unbounded parameters), there is no guarantee
that the defined priors are reasonable for the model.
Currently, there are seven types of parameters in brms models,
for which the user can specify prior distributions.
1. Population-level ('fixed') effects
Every Population-level effect has its own regression parameter
represents the name of the corresponding population-level effect.
Suppose, for instance, that y
is predicted by x1
and x2
(i.e., y ~ x1 + x2
in formula syntax).
Then, x1
and x2
have regression parameters
b_x1
and b_x2
respectively.
The default prior for population-level effects (including monotonic and
category specific effects) is an improper flat prior over the reals.
Other common options are normal priors or student-t priors.
If we want to have a normal prior with mean 0 and
standard deviation 5 for x1
, and a unit student-t prior with 10
degrees of freedom for x2
, we can specify this via
set_prior("normal(0,5)", class = "b", coef = "x1")
and
set_prior("student_t(10,0,1)", class = "b", coef = "x2")
.
To put the same prior on all population-level effects at once,
we may write as a shortcut set_prior("", class = "b")
.
This also leads to faster sampling, because priors can be vectorized in this case.
Both ways of defining priors can be combined using for instance
set_prior("normal(0,2)", class = "b")
and
set_prior("normal(0,10)", class = "b", coef = "x1")
at the same time. This will set a normal(0,10)
prior on
the effect of x1
and a normal(0,2)
prior
on all other population-level effects.
However, this will break vectorization and
may slow down the sampling procedure a bit.
In case of the default intercept parameterization
(discussed in the 'Details' section of
brmsformula
),
general priors on class "b"
will not affect the intercept.
Instead, the intercept has its own parameter class
named "Intercept"
and priors can thus be
specified via set_prior("", class = "Intercept")
.
Setting a prior on the intercept will not break vectorization
of the other population-level effects.
Note that technially, this prior is set on a temporary intercept
that results when internally centering all population-level predictors
around zero to improve sampling efficiency. To treat the intercept
as an ordinary population-level effect, use 0 + intercept
on the right-hand side of the model formula.
A special shrinkage prior to be applied on population-level effects
is the horseshoe prior (Carvalho et al., 2009).
It is symmetric around zero with fat tails and an infinitely large spike
at zero. This makes it ideal for sparse models that have
many regression coefficients, although only a minority of them is non-zero.
The horseshoe prior can be applied on all population-level effects at once
(excluding the intercept) by using set_prior("horseshoe(1)")
.
The 1
implies that the student-t prior of the local shrinkage
parameters has 1 degrees of freedom. This may, however, lead to an
increased number of divergent transition in Stan.
Accordingly, increasing the degrees of freedom to slightly higher values
(e.g., 3
) may often be a better option, although the prior
no longer resembles a horseshoe in this case.
Further, the scale of the global shrinkage parameter plays an important role
in amount of shrinkage applied. It defaults to 1
,
but this may result in too few shrinkage (Piironen & Vehtari, 2016).
It is thus possible to change the scale using argument scale_global
of the horseshoe prior, for instance horseshoe(1, scale_global = 0.5)
.
For recommendations how to properly set the global scale see
Piironen and Vehtari (2016).
To make sure that shrinkage can equally affect all coefficients,
predictors should be one the same scale.
Generally, models with horseshoe priors a more likely than other models
to have divergent transitions so that increasing adapt_delta
from 0.8
to values closer to 1
will often be necessary.
See the documentation of brm
for instructions
on how to increase adapt_delta
.
Another shrinkage prior is the so-called lasso prior.
It is the Bayesian equivalent to the LASSO method for performing
variable selection (Park & Casella, 2008).
With this prior, independent Laplace (i.e., double exponential) priors
are placed on the population-level effects.
The scale of the Laplace priors depends on a tuning parameter
that controls the amount of shrinkage. In brms, the inverse
of the tuning parameter is used so that smaller values imply
more shrinkage. The inverse tuning parameter has a chi-square distribution
and with degrees of freedom controlled via argument df
of function lasso
(defaults to 1
). For instance,
one can specify a lasso prior using set_prior("lasso(1)")
.
To make sure that shrinkage can equally affect all coefficients,
predictors should be one the same scale.
If you do not want to standarized all variables,
you can adjust the general scale of the lasso prior via argument
scale
, for instance, lasso(1, scale = 10)
.
In non-linear models, population-level effects are defined separately
for each non-linear parameter. Accordingly, it is necessary to specify
the non-linear parameter in set_prior
so that priors
we can be assigned correctly.
If, for instance, alpha
is the parameter and x
the predictor
for which we want to define the prior, we can write
set_prior("", coef = "x", nlpar = "alpha")
.
As a shortcut we can use set_prior("", nlpar = "alpha")
to set the same prior on all population-level effects of alpha
at once.
If desired, population-level effects can be restricted to fall only
within a certain interval using the lb
and ub
arguments
of set_prior
. This is often required when defining priors
that are not defined everywhere on the real line, such as uniform
or gamma priors. When defining a uniform(2,4)
prior,
you should write set_prior("uniform(2,4)", lb = 2, ub = 4)
.
When using a prior that is defined on the postive reals only
(such as a gamma prior) set lb = 0
.
In most situations, it is not useful to restrict population-level
parameters through bounded priors
(non-linear models are an important exception),
but if you really want to this is the way to go.
2. Standard deviations of group-level ('random') effects
Each group-level effect of each grouping factor has a standard deviation named
sd__
. Consider, for instance, the formula
y ~ x1 + x2 + (1 + x1 | g)
.
We see that the intercept as well as x1
are group-level effects
nested in the grouping factor g
.
The corresponding standard deviation parameters are named as
sd_g_Intercept
and sd_g_x1
respectively.
These parameters are restriced to be non-negative and, by default,
have a half student-t prior with 3 degrees of freedom and a
scale parameter that depends on the standard deviation of the response
after applying the link function. Minimally, the scale parameter is 10.
This prior is used (a) to be only very weakly informative in order to influence
results as few as possible, while (b) providing at least some regularization
to considerably improve convergence and sampling efficiency.
To define a prior distribution only for standard deviations
of a specific grouping factor,
use set_prior("", class = "sd", group = "")
.
To define a prior distribution only for a specific standard deviation
of a specific grouping factor, you may write
set_prior("", class = "sd", group = "", coef = "")
.
Recommendations on useful prior distributions for
standard deviations are given in Gelman (2006), but note that he
is no longer recommending uniform priors, anymore.
When defining priors on group-level parameters in non-linear models,
please make sure to specify the corresponding non-linear parameter
through the nlpar
argument in the same way as
for population-level effects.
3. Correlations of group-level ('random') effects
If there is more than one group-level effect per grouping factor,
the correlations between those effects have to be estimated.
The prior "lkj_corr_cholesky(eta)"
or in short
"lkj(eta)"
with eta > 0
is essentially the only prior for (Cholesky factors) of correlation matrices.
If eta = 1
(the default) all correlations matrices
are equally likely a priori. If eta > 1
, extreme correlations
become less likely, whereas 0 < eta < 1
results in
higher probabilities for extreme correlations.
Correlation matrix parameters in brms
models are named as
cor_
, (e.g., cor_g
if g
is the grouping factor).
To set the same prior on every correlation matrix,
use for instance set_prior("lkj(2)", class = "cor")
.
Internally, the priors are transformed to be put on the Cholesky factors
of the correlation matrices to improve efficiency and numerical stability.
The corresponding parameter class of the Cholesky factors is L
,
but it is not recommended to specify priors for this parameter class directly.
4. Standard deviations of smoothing terms
GAMMs are implemented in brms using the 'random effects'
formulation of smoothing terms (for details see
gamm
). Thus, each smoothing term
has its corresponding standard deviation modeling
the variability within this term. In brms, this
parameter class is called sds
and priors can
be specified via set_prior("", class = "sds",
coef = "")
. The default prior is the same as
for standard deviations of group-level effects.
5. Autocorrelation parameters
The autocorrelation parameters currently implemented are named
ar
(autoregression), ma
(moving average),
and arr
(autoregression of the response).
Priors can be defined by set_prior("", class = "ar")
for ar
and similar for ma
and arr
effects.
By default, ar
and ma
are bounded between -1
and 1
and arr
is unbounded (you may change this
by using the arguments lb
and ub
). The default
prior is flat over the definition area.
6. Distance parameters of monotonic effects
As explained in the details section of brm
,
monotonic effects make use of a special parameter vector to
estimate the 'normalized distances' between consecutive predictor
categories. This is realized in Stan using the simplex
parameter type and thus this class is also named "simplex"
in
brms. The only valid prior for simplex parameters is the
dirichlet prior, which accepts a vector of length K - 1
(K = number of predictor categories) as input defining the
'concentration' of the distribution. Explaining the dirichlet prior
is beyond the scope of this documentation, but we want to describe
how to define this prior syntactically correct.
If a predictor x
with K
categories is modeled as monotonic,
we can define a prior on its corresponding simplex via
set_prior("dirichlet()", class = "simplex", coef = "x")
.
For
, we can put in any R
expression
defining a vector of length K - 1
. The default is a uniform
prior (i.e. = rep(1, K-1)
) over all simplexes
of the respective dimension.
7. Parameters for specific families
Some families need additional parameters to be estimated.
Families gaussian
, student
, and cauchy
need the parameter sigma
to account for the residual standard deviation.
By default, sigma
has a half student-t prior that scales
in the same way as the group-level standard deviations.
Furthermore, family student
needs the parameter
nu
representing the degrees of freedom of students-t distribution.
By default, nu
has prior "gamma(2,0.1)"
and a fixed lower bound of 0
.
Families gamma
, weibull
, inverse.gaussian
, and
negbinomial
need a shape
parameter that has a
"gamma(0.01,0.01)"
prior by default.
For families cumulative
, cratio
, sratio
,
and acat
, and only if threshold = "equidistant"
,
the parameter delta
is used to model the distance between
two adjacent thresholds.
By default, delta
has an improper flat prior over the reals.
The von_mises
family needs the parameter kappa
, representing
the concentration parameter. By default, kappa
has prior
"gamma(2, 0.01)"
.
Every family specific parameter has its own prior class, so that
set_prior("", class = "")
is the right way to go.
All of these priors are chosen to be weakly informative,
having only minimal influence on the estimations,
while improving convergence and sampling efficiency. Often, it may not be immediately clear,
which parameters are present in the model.
To get a full list of parameters and parameter classes for which
priors can be specified (depending on the model)
use function get_prior
.
Gelman A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian analysis, 1(3), 515 -- 534. Park, T., & Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103(482), 681-686. Piironen J. & Vehtari A. (2016). On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior. https://arxiv.org/pdf/1610.05559v1.pdf
get_prior
## check which parameters can have priors
get_prior(rating ~ treat + period + carry + (1|subject),
data = inhaler, family = sratio(),
threshold = "equidistant")
## define some priors
prior <- c(set_prior("normal(0,10)", class = "b"),
set_prior("normal(1,2)", class = "b", coef = "treat"),
set_prior("cauchy(0,2)", class = "sd",
group = "subject", coef = "Intercept"),
set_prior("uniform(-5,5)", class = "delta"))
## verify that the priors indeed found their way into Stan's model code
make_stancode(rating ~ period + carry + cs(treat) + (1|subject),
data = inhaler, family = sratio(),
threshold = "equidistant",
prior = prior)
## use the horseshoe prior to model sparsity in population-level effects
make_stancode(count ~ log_Age_c + log_Base4_c * Trt_c,
data = epilepsy, family = poisson(),
prior = set_prior("horseshoe(3)"))
## alternatively use the lasso prior
make_stancode(count ~ log_Age_c + log_Base4_c * Trt_c,
data = epilepsy, family = poisson(),
prior = set_prior("lasso(1)"))
## use alias functions
(prior1 <- prior_string("cauchy(0, 1)", class = "sd"))
(prior2 <- prior(cauchy(0, 1), class = sd))
identical(prior1, prior2)
Run the code above in your browser using DataLab