check_model: Visual check of model assumptions

Description

Visual check of model various assumptions (normality of residuals, normality of random effects, linear relationship, homogeneity of variance, multicollinearity).

Usage

check_model(x, ...)
# S3 method for default
check_model(
  x,
  dot_size = 2,
  line_size = 0.8,
  panel = TRUE,
  check = "all",
  alpha = 0.2,
  dot_alpha = 0.8,
  colors = c("#3aaf85", "#1b6ca8", "#cd201f"),
  theme = "see::theme_lucid",
  detrend = FALSE,
  show_dots = NULL,
  verbose = TRUE,
  ...
)

Value

The data frame that is used for plotting.

Arguments

x: A model object.
...: Currently not used.
dot_size, line_size: Size of line and dot-geoms.
panel: Logical, if TRUE, plots are arranged as panels; else, single plots for each diagnostic are returned.
check: Character vector, indicating which checks for should be performed and plotted. May be one or more of "all", "vif", "qq", "normality", "linearity", "ncv", "homogeneity", "outliers", "reqq", "pp_check", "binned_residuals" or "overdispersion", Not that not all check apply to all type of models (see 'Details'). "reqq" is a QQ-plot for random effects and only available for mixed models. "ncv" is an alias for "linearity", and checks for non-constant variance, i.e. for heteroscedasticity, as well as the linear relationship. By default, all possible checks are performed and plotted.
alpha, dot_alpha: The alpha level of the confidence bands and dot-geoms. Scalar from 0 to 1.
colors: Character vector with color codes (hex-format). Must be of length 3. First color is usually used for reference lines, second color for dots, and third color for outliers or extreme values.
theme: String, indicating the name of the plot-theme. Must be in the format "package::theme_name" (e.g. "ggplot2::theme_minimal").
detrend: Should QQ/PP plots be detrended?
show_dots: Logical, if TRUE, will show data points in the plot. Set to FALSE for models with many observations, if generating the plot is too time-consuming. By default, show_dots = NULL. In this case check_model() tries to guess whether performance will be poor due to a very large model and thus automatically shows or hides dots.
verbose: Toggle off warnings.

Linearity Assumption

The plot Linearity checks the assumption of linear relationship. However, the spread of dots also indicate possible heteroscedasticity (i.e. non-constant variance); hence, the alias "ncv" for this plot. Some caution is needed when interpreting these plots. Although these plots are helpful to check model assumptions, they do not necessarily indicate so-called "lack of fit", e.g. missed non-linear relationships or interactions. Thus, it is always recommended to also look at effect plots, including partial residuals.

Residuals for (Generalized) Linear Models

Plots that check the normality of residuals (QQ-plot) or the homogeneity of variance use standardized Pearson's residuals for generalized linear models, and standardized residuals for linear models. The plots for the normality of residuals (with overlayed normal curve) and for the linearity assumption use the default residuals for lm and glm (which are deviance residuals for glm).

Troubleshooting

For models with many observations, or for more complex models in general, generating the plot might become very slow. One reason might be that the underlying graphic engine becomes slow for plotting many data points. In such cases, setting the argument show_dots = FALSE might help. Furthermore, look at the check argument and see if some of the model checks could be skipped, which also increases performance.

Details

For Bayesian models from packages rstanarm or brms, models will be "converted" to their frequentist counterpart, using bayestestR::bayesian_as_frequentist. A more advanced model-check for Bayesian models will be implemented at a later stage.

Examples

Run this code

if (FALSE) {
m <- lm(mpg ~ wt + cyl + gear + disp, data = mtcars)
check_model(m)

if (require("lme4")) {
  m <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
  check_model(m, panel = FALSE)
}

if (require("rstanarm")) {
  m <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200)
  check_model(m)
}
}

Run the code above in your browser using DataLab