Compute various measures or tests to assess the model quality, like root mean squared error, residual standard error or mean square error of fitted linear (mixed effects) models. For logistic regression models, or mixed models with binary outcome, the error rate, binned residuals, Chi-square goodness-of-fit-test or the Hosmer-Lemeshow Goodness-of-fit-test can be performed.
cv(x, ...)chisq_gof(x, prob = NULL, weights = NULL)
hoslem_gof(x, n.bins = 10)
rmse(x, normalized = FALSE)
rse(x)
mse(x)
error_rate(x)
binned_resid(x, term = NULL, n.bins = NULL)
Fitted linear model of class lm
, merMod
(lme4)
or lme
(nlme). For error_rate()
, hoslem_gof()
and binned_resid()
, a glm
-object with binomial-family. For
chisq_gof()
, a numeric vector or a glm
-object.
More fitted model objects, to compute multiple coefficients of variation at once.
Vector of probabilities (indicating the population probabilities)
of the same length as x
's amount of categories / factor levels.
Use nrow(table(x))
to determine the amount of necessary values
for prob
. Only used, when x
is a vector, and not a
glm
-object.
Vector with weights, used to weight x
.
Numeric, the number of bins to divide the data. For
hoslem_gof()
, the default is 10. For binned_resid()
, if
n.bins = NULL
, the square root of the number of observations is
taken.
Logical, use TRUE
if normalized rmse should be returned.
Name of independent variable from x
. If not NULL
,
average residuals for the categories of term
are plotted; else,
average residuals for the estimated probabilities of the response are
plotted.
rmse(), rse(), mse(), cv()
These functions return a number, the requested statistic.
error_rate()
A list with four values: the error rate of the full and the null model, as well as the chi-squared and p-value from the Likelihood-Ratio-Test between the full and null model.
binned_resid()
A data frame representing the data that is mapped to the plot, which is automatically plotted. In case all residuals are inside the error bounds, points are black. If some of the residuals are outside the error bounds (indicates by the grey-shaded area), blue points indicate residuals that are OK, while red points indicate model under- or overfitting for the related range of estimated probabilities.
chisq_gof()
For vectors, returns the object of the computed chisq.test
.
For glm
-objects, an object of class chisq_gof
with
following values: p.value
, the p-value for the goodness-of-fit test;
z.score
, the standardized z-score for the goodness-of-fit test;
rss
, the residual sums of squares term and chisq
, the pearson
chi-squared statistic.
hoslem_gof()
An object of class hoslem_test
with following values: chisq
,
the Hosmer-Lemeshow chi-squared statistic; df
, degrees of freedom
and p.value
the p-value for the goodness-of-fit test.
The RMSE is the square root of the variance of the residuals and indicates the absolute fit of the model to the data (difference between observed data to model's predicted values). “RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. Lower values of RMSE indicate better fit. RMSE is a good measure of how accurately the model predicts the response, and is the most important criterion for fit if the main purpose of the model is prediction.” (Grace-Martin K: Assessing the Fit of Regression Models)
The normalized RMSE is the proportion of the RMSE related to the range of the response variable. Hence, lower values indicate less residual variance.
The residual standard error is the square root of the residual sum of squares divided by the residual degrees of freedom.
The mean square error is the mean of the sum of squared residuals, i.e. it measures the average of the squares of the errors. Lower values (closer to zero) indicate better fit.
The advantage of the cv is that it is unitless. This allows coefficient of variation to be compared to each other in ways that other measures, like standard deviations or root mean squared residuals, cannot be.
“It is interesting to note the differences between a model's CV and R-squared values. Both are unitless measures that are indicative of model fit, but they define model fit in two different ways: CV evaluates the relative closeness of the predictions to the actual values while R-squared evaluates how much of the variability in the actual values is explained by the model.” (source: UCLA-FAQ)
The error rate is a crude measure for model fit for logistic regression
models. It is defined as the proportion of cases for which the
deterministic prediction is wrong, i.e. the proportion where the the
predicted probability is above 0.5, although y = 0 (and vice versa).
In general, the error rate should be below 0.5 (i.e. 50%), the
closer to zero, the better. Furthermore, the error rate of the full
model should be considerably below the null model's error rate
(cf. Gelman and Hill 2007, pp. 99). The print()
-method also
prints the results from the Likelihood-Ratio-Test, comparing the full
to the null model.
Binned residual plots are achieved by “dividing the data into categories (bins) based on their fitted values, and then plotting the average residual versus the average fitted value for each bin.” (Gelman, Hill 2007: 97). If the model were true, one would expect about 95% of the residuals to fall inside the error bounds.
If term
is not NULL
, one can compare the residuals in
relation to a specific model predictor. This may be helpful to check
if a term would fit better when transformed, e.g. a rising and falling
pattern of residuals along the x-axis (the pattern is indicated by
a green line) is a signal to consider taking the logarithm of the
predictor (cf. Gelman and Hill 2007, pp. 97ff).
For vectors, this function is a convenient function for the
chisq.test()
, performing goodness-of-fit test. For
glm
-objects, this function performs a goodness-of-fit test.
A well-fitting model shows no significant difference between the
model and the observed data, i.e. the reported p-values should be
greater than 0.05.
A well-fitting model shows no significant difference between the model and the observed data, i.e. the reported p-value should be greater than 0.05.
Gelman A, Hill J (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, New York: Cambridge University Press
Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, UK New York: Cambridge University Press
Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression. Hoboken, NJ, USA: John Wiley & Sons, Inc. 10.1002/0471722146
r2
for R-squared or pseudo-R-squared values.
# NOT RUN {
data(efc)
fit <- lm(barthtot ~ c160age + c12hour, data = efc)
rmse(fit)
rse(fit)
cv(fit)
library(lme4)
fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
rmse(fit)
mse(fit)
cv(fit)
# normalized RMSE
library(nlme)
fit <- lme(distance ~ age, data = Orthodont)
rmse(fit, normalized = TRUE)
#coefficient of variation for variable
cv(efc$e17age)
# Error Rate
efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m <- glm(
neg_c_7d ~ c161sex + barthtot + c172code,
data = efc,
family = binomial(link = "logit")
)
error_rate(m)
# Binned residuals
binned_resid(m)
binned_resid(m, "barthtot")
# goodness-of-fit test for logistic regression
chisq_gof(m)
# goodness-of-fit test for logistic regression
hoslem_gof(m)
# goodness-of-fit test for vectors against probabilities
# differing from population
chisq_gof(efc$e42dep, c(0.3,0.2,0.22,0.28))
# equal to population
chisq_gof(efc$e42dep, prop.table(table(efc$e42dep)))
# }
Run the code above in your browser using DataLab