predict.brmsfit: Model Predictions of `brmsfit` Objects

Description

Predict responses based on the fitted model. Can be performed for the data used to fit the model (posterior predictive checks) or for new data. By definition, these predictions have higher variance than predictions of the fitted values (i.e. the 'regression line') performed by the fitted method. This is because the measurement error is incorporated. The estimated means of both methods should, however, be very similar.

Usage

"predict"(object, newdata = NULL, re_formula = NULL, transform = NULL, allow_new_levels = FALSE, incl_autocor = TRUE, subset = NULL, nsamples = NULL, sort = FALSE, ntrys = 5, summary = TRUE, robust = FALSE, probs = c(0.025, 0.975), ...)

Arguments

object

An object of class brmsfit

newdata

An optional data.frame for which to evaluate predictions. If NULL (default), the orginal data of the model is used.

re_formula

formula containing random effects to be considered in the prediction. If NULL (default), include all random effects; if NA, include no random effects.

transform

A function or a character string naming a function to be applied on the predicted responses before summary statistics are computed.

allow_new_levels

A flag indicating if new levels of random effects are allowed (defaults to FALSE). Only relevant if newdata is provided.

incl_autocor

A flag indicating if autocorrelation parameters should be included in the predictions. Defaults to TRUE.

subset

A numeric vector specifying the posterior samples to be used. If NULL (the default), all samples are used.

nsamples

Positive integer indicating how many posterior samples should be used. If NULL (the default) all samples are used. Ignored if subset is not NULL.

sort

Logical. Only relevant for time series models. Indicating whether to return predicted values in the original order (FALSE; default) or in the order of the time series (TRUE).

ntrys

Parameter used in rejection sampling for truncated discrete models only (defaults to 5). See Details for more information.

summary

Should summary statistics (i.e. means, sds, and 95% intervals) be returned instead of the raw values? Default is TRUE.

robust

If FALSE (the default) the mean is used as the measure of central tendency and the standard deviation as the measure of variability. If TRUE, the median and the median absolute deivation (MAD) are applied instead. Only used if summary is TRUE.

probs

The percentiles to be computed by the quantile function. Only used if summary is TRUE.

...

Currently ignored

Value

Predicted values of the response variable. If summary = TRUE the output depends on the family: For catagorical and ordinal families, it is a N x C matrix, where N is the number of observations and C is the number of categories. For all other families, it is a N x E matrix where E is equal to length(probs) + 2. If summary = FALSE, the output is as a S x N matrix, where S is the number of samples.

Details

NA values within factors in newdata, are interpreted as if all dummy variables of this factor are zero. This allows, for instance, to make predictions of the grand mean when using sum coding.

For truncated discrete models only: In the absence of any general algorithm to sample from truncated discrete distributions, rejection sampling is applied in this special case. This means that values are sampled until a value lies within the defined truncation boundaries. In practice, this procedure may be rather slow (especially in R). Thus, we try to do approximate rejection sampling by sampling each value ntrys times and then select a valid value. If all values are invalid, the closest boundary is used, instead. If there are more than a few of these pathological cases, a warning will occure suggesting to increase argument ntrys. For models fitted with brms <= 0.5.0="" only:="" be="" careful="" when="" using="" newdata with factors in fixed or random effects. The predicted results are only valid if all factor levels present in the initial data are also defined and ordered correctly for the factors in newdata. Grouping factors may contain fewer levels than in the inital data without causing problems. When using higher versions of brms, all factors are automatically checked for correctness and amended if necessary.

Examples

Run this code

## Not run: 
# ## fit a model
# fit <- brm(time | cens(censored) ~ age + sex + (1+age||patient), 
#            data = kidney, family = "exponential", inits = "0")
# 
# ## predicted responses
# pp <- predict(fit)
# head(pp)
# 
# ## predicted responses excluding the random effect of age
# pp2 <- predict(fit, re_formula = ~ (1|patient))
# head(pp2)
# 
# ## predicted responses of patient 1 for new data
# newdata <- data.frame(sex = factor(c("male", "female")),
#                       age = c(20, 50),
#                       patient = c(1, 1))
# predict(fit, newdata = newdata)
# ## End(Not run)

Run the code above in your browser using DataLab