marginal_effects.brmsfit: Display marginal effects of predictors

Description

Display marginal effects of one or more numeric and/or categorical predictors including two-way interaction effects.

Usage

# S3 method for brmsfit
marginal_effects(x, effects = NULL, conditions = NULL,
  int_conditions = NULL, re_formula = NA, robust = TRUE,
  probs = c(0.025, 0.975), method = c("fitted", "predict"),
  spaghetti = FALSE, surface = FALSE, transform = NULL,
  resolution = 100, select_points = 0, too_far = 0, ...)
marginal_effects(x, ...)
# S3 method for brmsMarginalEffects
plot(x, ncol = NULL, points = FALSE,
  rug = FALSE, mean = TRUE, jitter_width = 0, stype = c("contour",
  "raster"), line_args = list(), cat_args = list(),
  errorbar_args = list(), surface_args = list(), spaghetti_args = list(),
  point_args = list(), rug_args = list(), theme = NULL, ask = TRUE,
  plot = TRUE, ...)

Arguments

An R object usually of class brmsfit.

effects

An optional character vector naming effects (main effects or interactions) for which to compute marginal plots. Interactions are specified by a : between variable names. If NULL (the default), plots are generated for all main effects and two-way interactions estimated in the model. When specifying effects manually, all two-way interactions may be plotted even if not orginally modeled.

conditions

An optional data.frame containing variable values to condition on. Each effect defined in effects will be plotted separately for each row of data. The row names of data will be treated as titles of the subplots. It is recommended to only define a few rows in order to keep the plots clear. If NULL (the default), numeric variables will be marginalized by using their means and factors will get their reference level assigned.

int_conditions

An optional named list whose elements are numeric vectors of values of the second variables in two-way interactions. At these values, predictions are evaluated. The names of int_conditions have to match the variable names exactly. Additionally, the elements of the numeric vectors may be named themselves, in which case their names appear as labels for the conditions in the plots. Instead of vectors, functions returning vectors may be passed and are applied on the original values of the corresponding variable. If NULL (the default), predictions are evaluated at the \(mean\) and at \(mean +/- sd\).

re_formula

A formula containing random effects to be considered in the marginal predictions. If NULL, include all random effects; if NA (default), include no random effects.

robust

If TRUE (the default) the median is used as the measure of central tendency. If FALSE the mean is used instead.

probs

The quantiles to be used in the computation of credible intervals (defaults to 2.5 and 97.5 percent quantiles)

method

Either "fitted" or "predict". If "fitted", plot marginal predictions of the regression curve. If "predict", plot marginal predictions of the responses.

spaghetti

Logical; Indicates whether predictions should be visualized via spagetti plots. Only applied for numeric predictors. If TRUE, it is recommended to set argument nsamples to a relatively small value (e.g. 100) in order to reduce computation time.

surface

Logical; Indicates whether interactions or two-dimensional smooths should be visualized as a surface. Defaults to FALSE. The surface type can be controlled via argument stype of the related plotting method.

transform

A function or a character string naming a function to be applied on the predicted responses before summary statistics are computed. Only allowed if method = "predict".

resolution

Number of support points used to generate the plots. Higher resolution leads to smoother plots. Defaults to 100. If surface is TRUE, this implies 10000 support points for interaction terms, so it might be necessary to reduce resolution when only few RAM is available.

select_points

Positive number. Only relevant if points or rug are set to TRUE: Actual data points of numeric variables that are too far away from the values specified in conditions can be excluded from the plot. Values are scaled into the unit interval and then points more than select_points from the values in conditions are excluded. By default, all points are used.

too_far

Positive number. For surface plots only: Grid points that are too far away from the actual data points can be excluded from the plot. too_far determines what is too far. The grid is scaled into the unit square and then grid points more than too_far from the predictor variables are excluded. By default, all grid points are used. Ignored for non-surface plots.

...

Further arguments such as subset or nsamples passed to predict or fitted.

ncol

Number of plots to display per column for each effect. If NULL (default), ncol is computed internally based on the number of rows of conditions.

points

Logical; indicating whether the original data points should be added via geom_jitter. Default is FALSE. Note that only those data points will be added that match the specified conditions defined in conditions. For categorical predictors, the conditions have to match exactly. For numeric predictors, argument select_points is used to determine, which points do match a condition.

rug

Logical; indicating whether a rug representation of predictor values should be added via geom_rug. Default is FALSE. Depends on select_points in the same way as points does.

mean

Logical; only relevant for spaghetti plots. If TRUE (the default), display the mean regression line on top of the regression lines for each sample.

jitter_width

Only used if points = TRUE: Amount of horizontal jittering of the data points. Mainly useful for ordinal models. Defaults to 0 that is no jittering.

stype

Indicates how surface plots should be displayed. Either "contour" or "raster".

line_args

Only used in plots of continuous predictors: A named list of arguments passed to geom_smooth.

cat_args

Only used in plots of categorical predictors: A named list of arguments passed to geom_point.

errorbar_args

Only used in plots of categorical predictors: A named list of arguments passed to geom_errorbar.

surface_args

Only used in surface plots: A named list of arguments passed to geom_contour or geom_raster (depending on argument stype).

spaghetti_args

Only used in spaghetti plots: A named list of arguments passed to geom_smooth.

point_args

Only used if points = TRUE: A named list of arguments passed to geom_jitter.

rug_args

Only used if rug = TRUE: A named list of arguments passed to geom_rug.

theme

A theme object modifying the appearance of the plots. For some basic themes see ggtheme and theme_default.

ask

logical; indicates if the user is prompted before a new page is plotted. Only used if plot is TRUE.

plot

logical; indicates if plots should be plotted directly in the active graphic device. Defaults to TRUE.

Value

An object of class brmsMarginalEffects, which is a named list with one data.frame per effect containing all information required to generate marginal effects plots. Among others, these data.frames contain some special variables, namely estimate__ (predicted values of the response), se__ (standard error of the predicted response), lower__ and upper__ (lower and upper bounds of the uncertainty interval of the response), as well as cond__ (used in faceting when conditions contains multiple rows).

The corresponding plot method returns a named list of ggplot objects, which can be further customized using the ggplot2 package.

Details

When creating marginal_effects for a particular predictor (or interaction of two predictors), one has to choose the values of all other predictors to condition on. By default, the mean is used for continuous variables and the reference category is used for factors, but you may change these values via argument conditions. This also has an implication for the points argument: In the created plots, only those points will be shown that correspond to the factor levels actually used in the conditioning, in order not to create the false impressivion of bad model fit, where it is just due to conditioning on certain factor levels. Since we condition on rather than actually marginalizing variables, the name marginal_effects is possibly not ideally chosen in retrospect.

NA values within factors in conditions, are interpreted as if all dummy variables of this factor are zero. This allows, for instance, to make predictions of the grand mean when using sum coding.

To fully change colours of the created plots, one has to amend both scale_colour and scale_fill. See scale_colour_grey or scale_colour_gradient for more details.

Examples

Run this code

# NOT RUN {
fit <- brm(count ~ log_Age_c + log_Base4_c * Trt + (1 | patient),
           data = epilepsy, family = poisson()) 
           
## plot all marginal effects
plot(marginal_effects(fit), ask = FALSE)

## change colours to grey scale
me <- marginal_effects(fit, "log_Base4_c:Trt")
plot(me, plot = FALSE)[[1]] + 
  scale_color_grey() +
  scale_fill_grey()

## only plot the marginal interaction effect of 'log_Base4_c:Trt'
## for different values for 'log_Age_c'
conditions <- data.frame(log_Age_c = c(-0.3, 0, 0.3))
plot(marginal_effects(fit, effects = "log_Base4_c:Trt", 
                      conditions = conditions))
                      
## also incorporate random effects variance over patients
## also add data points and a rug representation of predictor values
plot(marginal_effects(fit, effects = "log_Base4_c:Trt", 
                      conditions = conditions, re_formula = NULL), 
     points = TRUE, rug = TRUE)
 
## change handling of two-way interactions
int_conditions <- list(
  log_Base4_c = setNames(c(-2, 1, 0), c("b", "c", "a"))
)
marginal_effects(fit, effects = "Trt:log_Base4_c",
                 int_conditions = int_conditions)
marginal_effects(fit, effects = "Trt:log_Base4_c",
                 int_conditions = list(log_Base4_c = quantile))        
     
## fit a model to illustrate how to plot 3-way interactions
fit3way <- brm(count ~ log_Age_c * log_Base4_c * Trt, data = epilepsy)
conditions <- data.frame(log_Age_c = c(-0.3, 0, 0.3))
rownames(conditions) <- paste("log_Age_c =", conditions$log_Age_c)
marginal_effects(
  fit3way, "log_Base4_c:Trt", conditions = conditions
)
## only include points close to the specified values of log_Age_c
me <- marginal_effects(
 fit3way, "log_Base4_c:Trt", conditions = conditions, 
 select_points = 0.1
)
plot(me, points = TRUE)
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab