Check model quality of logistic regression models.
binned_residuals(model, term = NULL, n_nins = NULL)
A glm
-object with binomial-family.
Name of independent variable from x
. If not NULL
,
average residuals for the categories of term
are plotted; else,
average residuals for the estimated probabilities of the response are
plotted.
Numeric, the number of bins to divide the data. If
n_nins = NULL
, the square root of the number of observations is
taken.
A data frame representing the data that is mapped to the plot, which is automatically plotted. In case all residuals are inside the error bounds, points are black. If some of the residuals are outside the error bounds (indicates by the grey-shaded area), blue points indicate residuals that are OK, while red points indicate model under- or overfitting for the related range of estimated probabilities.
Binned residual plots are achieved by “dividing the data into categories (bins) based on their fitted values, and then plotting the average residual versus the average fitted value for each bin.” (Gelman, Hill 2007: 97). If the model were true, one would expect about 95% of the residuals to fall inside the error bounds.
If term
is not NULL
, one can compare the residuals in
relation to a specific model predictor. This may be helpful to check
if a term would fit better when transformed, e.g. a rising and falling
pattern of residuals along the x-axis (the pattern is indicated by
a green line) is a signal to consider taking the logarithm of the
predictor (cf. Gelman and Hill 2007, pp. 97ff).
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge; New York: Cambridge University Press.
# NOT RUN {
model <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
binned_residuals(model)
# }
Run the code above in your browser using DataLab