brm: Fit Bayesian Generalized (Non-)Linear Multilevel Models

Description

Fit a Bayesian generalized (non-)linear Multilevel model using Stan

Usage

brm(formula, data = NULL, family = gaussian(), prior = NULL,
  autocor = NULL, nonlinear = NULL, partial = NULL,
  threshold = c("flexible", "equidistant"), cov_ranef = NULL,
  ranef = TRUE, sparse = FALSE, sample_prior = FALSE, stan_funs = NULL,
  fit = NA, inits = "random", chains = 4, iter = 2000,
  warmup = floor(iter/2), thin = 1, cluster = 1, cluster_type = "PSOCK",
  control = NULL, algorithm = c("sampling", "meanfield", "fullrank"),
  silent = TRUE, seed = 12345, save_model = NULL, ...)

Arguments

formula

An object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

data

An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically t

family

A description of the error distribution and link function to be used in the model. This can be a family function, a call to a family function or a character string naming the family. Currently, the following families are supported: gaussian

prior

One or more brmsprior objects created by function set_prior and combined using the c method. A single brmsprior object may be passed without

autocor

An optional cor_brms object describing the correlation structure within the response variable (i.e. the 'autocorrelation'). See the documentation of cor_brms

nonlinear

An optional list of formuluas, specifying linear models for non-linear parameters. If NULL (the default) formula is treated as an ordinary formula. If not NULL, formula is treated as a non-linear mo

partial

(Deprecated) A one sided formula of the form ~expression allowing to specify predictors with category specific effects in non-cumulative ordinal models (i.e. in families cratio, sratio, or acat).

threshold

A character string indicating the type of thresholds (i.e. intercepts) used in an ordinal model. "flexible" provides the standard unstructured thresholds and "equidistant" restricts the distance between consecutive thresh

cov_ranef

A list of matrices that are proportional to the (within) covariance structure of the random effects. The names of the matrices should correspond to columns in data that are used as grouping factors. All levels of the grouping factor s

ranef

A flag to indicate if random effects for each level of the grouping factor(s) should be saved (default is TRUE). Set to FALSE to save memory. The argument has no impact on the model fitting itself.

sparse

Logical; indicates whether the fixed effects design matrix should be treated as sparse (defaults to FALSE). For design matrices with many zeros, this can considerably reduce required memory. For all models using multivariate syntax (i.

sample_prior

A flag to indicate if samples from all specified proper priors should be drawn additionally to the posterior samples (defaults to FALSE). Among others, these samples can be used to calculate Bayes factors for point hypotheses. Alternat

stan_funs

An optional character string containing self-defined Stan functions, which will be included in the functions block of the generated Stan code.

fit

An instance of S3 class brmsfit derived from a previous fit; defaults to NA. If fit is of class brmsfit, the compiled model associated with the fitted result is re-used and all arguments modifyin

inits

Either "random" or "0". If inits is "random" (the default), Stan will randomly generate initial values for parameters. If it is "0", all parameters are initiliazed to zero. This option is recomme

chains

Number of Markov chains (defaults to 4). A deprecated alias is n.chains.

iter

Number of total iterations per chain (including warmup; defaults to 2000). A deprecated alias is n.iter.

warmup

A positive integer specifying number of warmup (aka burnin) iterations. This also specifies the number of iterations used for stepsize adaptation, so warmup samples should not be used for inference. The number of warmup should not be larger than

thin

Thinning rate. Must be a positive integer. Set thin > 1 to save memory and computation time if iter is large. Default is 1, that is no thinning. A deprecated alias is n.thin.

cluster

Number of clusters to use to run parallel chains. Default is 1. A deprecated alias is n.cluster. To use the built-in parallel execution of rstan, specify argument cores instead of cluster.

cluster_type

A character string specifying the type of cluster created by makeCluster when sampling in parallel (i.e. when cluster is greater 1). Default is

control

A named list of parameters to control the sampler's behavior. It defaults to NULL so all the default values are used. The most important control parameters are discussed in the 'Details' section below. For a comprehensive ov

algorithm

Character string indicating the estimation approach to use. Can be "sampling" for MCMC (the default), "meanfield" for variational inference with independent normal distributions, or "fullrank" for variational inf

silent

logical; If TRUE, warning messages of the sampler are suppressed.

seed

Positive integer. Used by set.seed to make results reproducable.

save_model

Either NULL or a character string. In the latter case, the model code is saved in a file named after the string supplied in save_model, which may also contain the full path where to save the file. If only a name is given, t

...

Further arguments to be passed to Stan.

Value

An object of class brmsfit, which contains the posterior samples along with many other useful information about the model. Use methods(class = "brmsfit") for an overview on available methods.

Details

Fit a generalized (non-)linear multilevel model, which incorporates both population-level parameters (also known as fixed-effects) and group-level parameters (also known as random effects) in a (non-)linear predictor via full Bayesian inference using Stan. Formula syntax for multilevel models The formula argument accepts formulas of the following syntax: response | addition ~ fixed + (random | group) The fixed term contains effects that are assumend to be the same across obervations. We call them 'population-level' effects or (adopting frequentist vocabulary) 'fixed' effects. The optional random terms may contain effects that are assumed to vary accross grouping variables specified in group. We call them 'group-level' effects or (adopting frequentist vocabulary) 'random' effects, although the latter name is misleading in a Bayesian context (for more details type vignette("brms")). Multiple grouping factors each with multiple group-level effects are possible. Instead of | you may use || in grouping terms to prevent correlations from being modeled. With two exceptions, this is basically lme4 syntax. The first exception is that fixed may contain two non-standard types of population-level effects namely monotonous and category specific effects, which can be specified using terms of the form monotonous() and cse() respectively. The latter can only be applied in ordinal models and is explained in more detail in the package's vignette (type vignette("brms")). The former effect type is explained here. A monotonous predictor must either be integer valued or an ordered factor, which is the first difference to an ordinary continuous predictor. More importantly, predictor categories (or integers) are not assumend to be equidistant with respect to their effect on the response variable. Instead, the distance between adjacent predictor categories (or integers) is estimated from the data and may vary across categories. This is realized by parameterizing as follows: One parameter takes care of the direction and size of the effect similar to an ordinary regression parameter, while an additional parameter vector estimates the normalized distances between consecutive predictor categories. A main application of monotonous effects are ordinal predictors that can this way be modeled without (falsely) treating them as continuous or as unordered categorical predictors. The second eception is the optional addition term, which may contain multiple terms of the form fun(variable) seperated by + each providing special information on the response variable. fun can be replaced with either se, weights, disp, trials, cat, cens, or trunc. Their meanings are explained below. For families gaussian, student, and cauchy it is possible to specify standard errors of the observation, thus allowing to perform meta-analysis. Suppose that the variable yi contains the effect sizes from the studies and sei the corresponding standard errors. Then, fixed and random effects meta-analyses can be conducted using the formulae yi | se(sei) ~ 1 and yi | se(sei) ~ 1 + (1|study), respectively, where study is a variable uniquely identifying every study. If desired, meta-regression can be performed via yi | se(sei) ~ 1 + mod1 + mod2 + (1|study) or yi | se(sei) ~ 1 + mod1 + mod2 + (1 + mod1 + mod2|study), where mod1 and mod2 represent moderator variables. For all families, weighted regression may be performed using weights in the addition part. Internally, this is implemented by multiplying the log-posterior values of each observation by their corresponding weights. Suppose that variable wei contains the weights and that yi is the response variable. Then, formula yi | weights(wei) ~ predictors implements a weighted regression. The addition argument disp (short for dispersion) serves a similar purpose than weight. However, it has a different implementation and is less general as it is only usable for the families gaussian, student, cauchy, Gamma, weibull, and negbinomial. For the former three families, the residual standard deviation sigma is multiplied by the values given in disp, so that higher values lead to lower weights. Contrariwise, for the latter three families, the parameter shape is multiplied by the values given in disp. As shape can be understood as a precision parameter (inverse of the variance), higher values will lead to higher weights in this case. For families binomial and zero_inflated_binomial, addition should contain a variable indicating the number of trials underlying each observation. In lme4 syntax, we may write for instance cbind(success, n - success), which is equivalent to success | trials(n) in brms syntax. If the number of trials is constant across all observation (say 10), we may also write success | trials(10). For family categorical and all ordinal families, addition may contain a term cat(number) to specify the number categories (e.g, cat(7)). If not given, the number of categories is calculated from the data. With the expection of categorical and ordinal families, left and right censoring can be modeled through yi | cens(censored) ~ predictors. The censoring variable (named censored in this example) should contain the values 'left', 'none', and 'right' (or equivalenty -1, 0, and 1) to indicate that the corresponding observation is left censored, not censored, or right censored. With the expection of categorical and ordinal families, the response distribution can be truncated using the trunc function in the addition part. If the response variable is truncated between, say, 0 and 100, we can specify this via yi | trunc(lb = 0, ub = 100) ~ predictors. Defining only one of the two arguments in trunc leads to one-sided truncation. Mutiple addition terms may be specified at the same time using the + operator, for instance formula = yi | se(sei) + cens(censored) ~ 1 for a censored meta-analytic model. For families gaussian, student, and cauchy multivariate models may be specified using cbind notation. Suppose that y1 and y2 are response variables and x is a predictor. Then cbind(y1,y2) ~ x specifies a multivariate model, where x has the same effect on y1 and y2. To indicate different effects on each response variable, the factor trait (which is reserved in multivariate models) can be used as an additional predictor. For instance, cbind(y1,y2) ~ 0 + trait + x:trait leads to seperate effects of x on y1 and y2 as well as to separate intercepts. In this case, trait has two levels, namely "y1" and "y2". It may also be used within random effects terms, both as grouping factor or as random effect within a grouping factor. Note that variable trait is generated internally and may not be specified in the data passed to brm. As of brms 0.9.0, categorical models use the same syntax as multivariate models, but in this case, trait differentiates between the response categories. As in most other implementations of categorical models, values of one category are fixed to identify the model. Accordingly, trait has K - 1 levels, where K is the number of categories. Usually, it makes most sense to use terms of the structure 0 + trait + trait:(). Zero-inflated and hurdle families are bivariate and also make use of the special internal variable trait having two levels in this case. However, only the actual response must be specified in formula, as the second response variable used for the zero-inflation / hurdle (ZIH) part is internally generated. A formula for this type of models may, for instance, look like this: y ~ 0 + trait + trait:(x1 + x2) + (0 + trait | g). In this example, the predictors x1 and x1 influence the ZIH part differentlythan the actual response part as indicated by their interaction with trait. In addition, a group-level effect of trait was added while the group-level intercept was removed leading to the estimation of two effects, one for the ZIH part and one for the actual response. In the example above, the correlation between the two effects will also be estimated. Sometimes, predictors should only influence the ZIH part but not the actual response (or vice versa). As this cannot be modeled with the trait variable, two other internally generated and reserved (numeric) variables, namely main and spec, are supported. main is 1 for the response part and 0 for the ZIH part of the model. For spec it is the other way round. Suppose that x1 should only influence the actual response, and x2 only the ZIH process. We can write this as follows: formula = y ~ 0 + main + spec + main:x1 + spec:x2. The main effects of main or spec serve as intercepts, while the interaction terms main:x1 and spec:x2 ensure that x1 and x2 only predict one part of the model, respectively. Using the same syntax as for zero-inflated and hurdle models, it is possible to specify multiplicative effects in family bernoulli (make sure to set argument type to "2PL"; see brmsfamily for more details). In Item Response Theory (IRT), these models are known as 2PL models. Suppose that we have the variables item and person and want to model fixed effects for items and random effects for persons. The discriminality (multiplicative effect) should depend only on the items. We can specify this by setting formula = response ~ 0 + (main + spec):item + (0 + main|person). The random term 0 + main ensures that person does not influence discriminalities. Of course it is possible to predict only discriminalities by using variable spec in the model formulation. To identify the model, multiplicative effects are estimated on the log scale. In addition, we strongly recommend setting proper priors on fixed effects in this case to increase sampling efficiency (for details on priors see set_prior). Parameterization of the population-level intercept The population-level intercept (if incorporated) is estimated separately and not as part of population-level parameter vector b. This has the side effect that priors on the intercept also have to be specified separately (see set_prior for more details). Furthermore, to increase sampling efficiency, the fixed effects design matrix X is centered around its column means X_means if the intercept is incorporated. This leads to a temporary bias in the intercept equal to , where <,> is the scalar product. The bias is corrected after fitting the model, but be aware that you are effectively defining a prior on the temporary intercept of the centered design matrix not on the real intercept. This behavior can be avoided by using the reserved (and internally generated) variable intercept. Instead of y ~ x, you may write y ~ -1 + intercept + x. This way, priors can be defined on the real intercept, directly. In addition, the intercept is just treated as an ordinary fixed effect and thus priors defined on b will also apply to it. Note that this parameterization may be a bit less efficient than the default parameterization discussed above. Formula syntax for non-linear multilevel models Using the nonlinear argument, it is possible to specify non-linear models in brms. Contrary to what the name might suggest, nonlinear should not contain the non-linear model itself but rather information on the non-linear parameters. The non-linear model will just be specified within the formula argument. Suppose, that we want to predict the response y through the predictor x, where x is linked to y through y = alpha - beta * lambda^x, with parameters alpha, beta, and lambda. This is certainly a non-linear model being defined via formula = y ~ alpha - beta * lambda^x (addition arguments can be added in the same way as for ordinary formulas). Now we have to tell brms the names of the non-linear parameters and specfiy a (linear mixed) model for each of them using the nonlinear argument. Let's say we just want to estimate those three parameters with no further covariates or random effects. Then we can write nonlinear = alpha + beta + lambda ~ 1 or equivalently (and more flexible) nonlinear = list(alpha ~ 1, beta ~ 1, lambda ~ 1). This can, of course, be extended. If we have another predictor z and observations nested within the grouping factor g, we may write for instance nonlinear = list(alpha ~ 1, beta ~ 1 + z + (1|g), lambda ~ 1). The formula syntax described above applies here as well. In this example, we are using z and g only for the prediction of beta, but we might also use them for the other non-linear parameters (provided that the resulting model is still scientifically reasonable). Non-linear models may not be uniquely identified and / or show bad convergence. For this reason it is mandatory to specify priors on the non-linear parameters. For instructions on how to do that, see set_prior. Families and link functions Family gaussian with identity link leads to linear regression. Families student, and cauchy with identity link leads to robust linear regression that is less influenced by outliers. Families poisson, negbinomial, and geometric with log link lead to regression models for count data. Families binomial and bernoulli with logit link leads to logistic regression and family categorical to multi-logistic regression when there are more than two possible outcomes. Families cumulative, cratio ('contiuation ratio'), sratio ('stopping ratio'), and acat ('adjacent category') leads to ordinal regression. Families Gamma, weibull, exponential, and inverse.gaussian can be used (among others) for survival regression when combined with the log link. Families hurdle_poisson, hurdle_negbinomial, hurdle_gamma, zero_inflated_poisson, and zero_inflated_negbinomial combined with the log link, and zero_inflated_binomial with the logit link, allow to estimate zero-inflated and hurdle models. These models can be very helpful when there are many zeros in the data that cannot be explained by the primary distribution of the response. Family hurdle_gamma is especially useful, as a traditional Gamma model cannot be reasonably fitted for data containing zeros in the response. In the following, we list all possible links for each family. The families gaussian, student, and cauchy accept the links (as names) identity, log, and inverse; families poisson, negbinomial, and geometric the links log, identity, and sqrt; families binomial, bernoulli, Beta, cumulative, cratio, sratio, and acat the links logit, probit, probit_approx, cloglog, and cauchit; family categorical the link logit; families Gamma, weibull, and exponential the links log, identity, and inverse; family inverse.gaussian the links 1/mu^2, inverse, identity and log; families hurdle_poisson, hurdle_negbinomial, hurdle_gamma, zero_inflated_poisson, and zero_inflated_negbinomial the link log. The first link mentioned for each family is the default. Please note that when calling the Gamma family function, the default link will be inverse not log. Also, the probit_approx link cannot be used when calling the binomial family function. The current implementation of inverse.gaussian models has some convergence problems and requires carefully chosen prior distributions to work efficiently. For this reason, we currently do not recommend to use the inverse.gaussian family, unless you really feel that your data requires exactly this type of model. Prior distributions As of brms 0.5.0, priors should be specified using the set_prior function. Its documentation contains detailed information on how to correctly specify priors. To find out on which parameters or parameter classes priors can be defined, use get_prior. Adjusting the sampling behavior of Stan In addition to choosing the number of iterations, warmup samples, and chains, users can control the behavior of the NUTS sampler, by using the control argument. The most important reason to use control is to decrease (or eliminate at best) the number of divergent transitions that cause a bias in the obtained posterior samples. Whenever you see the warning "There were x divergent transitions after warmup." you should really think about increasing adapt_delta. To do this, write control = list(adapt_delta = ), where should usually be value between 0.8 (current default) and 1. Increasing adapt_delta will slow down the sampler but will decrease the number of divergent transitions threatening the validity of your posterior samples. Another problem arises when the depth of the tree being evaluated in each iteration is exceeded. This is less common than having divergent transitions, but may also bias the posterior samples. When it happens, Stan will throw out a warning suggesting to increase max_treedepth, which can be accomplished by writing control = list(max_treedepth = ) with a positive integer that should usually be larger than the current default of 10. For more details on the control argument see stan.

Examples

Run this code

## Poisson regression for the number of seizures in epileptic patients
## using student_t priors for population-level effects 
## and half cauchy priors for standard deviations of group-level effects 
fit1 <- brm(count ~ log_Age_c + log_Base4_c * Trt_c 
            + (1|patient) + (1|visit) + (1|obs), 
            data = epilepsy, family = poisson(), 
            prior = c(set_prior("student_t(5,0,10)", class = "b"),
                      set_prior("cauchy(0,2)", class = "sd")))
## generate a summary of the results
summary(fit1)
## plot the MCMC chains as well as the posterior distributions
plot(fit1, ask = FALSE)
## extract random effects standard devations and covariance matrices
VarCorr(fit1)
## extract group specific effects of each level
ranef(fit1)
## predict responses based on the fitted model
head(predict(fit1))  
## plot marginal effects of each predictor
plot(marginal_effects(fit1), ask = FALSE)
 
## Ordinal regression modeling patient's rating of inhaler instructions 
## category specific effects are estimated for variable 'treat'
fit2 <- brm(rating ~ period + carry + cse(treat), 
            data = inhaler, family = sratio("cloglog"), 
            prior = set_prior("normal(0,5)"), chains = 2)
summary(fit2)
plot(fit2, ask = FALSE)    

## Survival regression modeling the time between the first 
## and second recurrence of an infection in kidney patients.
fit3 <- brm(time | cens(censored) ~ age * sex + disease + (1|patient), 
            data = kidney, family = gaussian("log"))
summary(fit3) 
plot(fit3, ask = FALSE)
plot(marginal_effects(fit3), ask = FALSE)   

## Probit regression using the binomial family
n <- sample(1:10, 100, TRUE)  # number of trials
success <- rbinom(100, size = n, prob = 0.4)
x <- rnorm(100)
fit4 <- brm(success | trials(n) ~ x, 
            family = binomial("probit"))
summary(fit4)

## Simple non-linear gaussian model
x <- rnorm(100)
y <- rnorm(100, mean = 2 - 1.5^x, sd = 1)
fit5 <- brm(y ~ a1 - a2^x, nonlinear = a1 + a2 ~ 1,
            prior = c(set_prior("normal(0, 2)", nlpar = "a1"),
                      set_prior("normal(0, 2)", nlpar = "a2")))
summary(fit5)
plot(marginal_effects(fit5), ask = FALSE)

Run the code above in your browser using DataLab