cv: Compute model quality

Description

Compute various measures or tests to assess the model quality, like root mean squared error, residual standard error or mean square error of fitted linear (mixed effects) models. For logistic regression models, or mixed models with binary outcome, the error rate, binned residuals, Chi-square goodness-of-fit-test or the Hosmer-Lemeshow Goodness-of-fit-test can be performed.

Usage

cv(x, ...)
chisq_gof(x, prob = NULL, weights = NULL)
hoslem_gof(x, n.bins = 10)
rmse(x, normalized = FALSE)
rse(x)
mse(x)
error_rate(x)
binned_resid(x, term = NULL, n.bins = NULL)

Arguments

Fitted linear model of class lm, merMod (lme4) or lme (nlme). For error_rate(), hoslem_gof() and binned_resid(), a glm-object with binomial-family. For chisq_gof(), a numeric vector or a glm-object.

...

More fitted model objects, to compute multiple coefficients of variation at once.

prob

Vector of probabilities (indicating the population probabilities) of the same length as x's amount of categories / factor levels. Use nrow(table(x)) to determine the amount of necessary values for prob. Only used, when x is a vector, and not a glm-object.

weights

Vector with weights, used to weight x.

n.bins

Numeric, the number of bins to divide the data. For hoslem_gof(), the default is 10. For binned_resid(), if n.bins = NULL, the square root of the number of observations is taken.

normalized

Logical, use TRUE if normalized rmse should be returned.

term

Name of independent variable from x. If not NULL, average residuals for the categories of term are plotted; else, average residuals for the estimated probabilities of the response are plotted.

Value

rmse(), rse(), mse(), cv(): These functions return a number, the requested statistic.
error_rate(): A list with four values: the error rate of the full and the null model, as well as the chi-squared and p-value from the Likelihood-Ratio-Test between the full and null model.
binned_resid(): A data frame representing the data that is mapped to the plot, which is automatically plotted. In case all residuals are inside the error bounds, points are black. If some of the residuals are outside the error bounds (indicates by the grey-shaded area), blue points indicate residuals that are OK, while red points indicate model under- or overfitting for the related range of estimated probabilities.
chisq_gof(): For vectors, returns the object of the computed chisq.test. For glm-objects, an object of class chisq_gof with following values: p.value, the p-value for the goodness-of-fit test; z.score, the standardized z-score for the goodness-of-fit test; rss, the residual sums of squares term and chisq, the pearson chi-squared statistic.
hoslem_gof(): An object of class hoslem_test with following values: chisq, the Hosmer-Lemeshow chi-squared statistic; df, degrees of freedom and p.value the p-value for the goodness-of-fit test.

Details

Root Mean Square Error

The RMSE is the square root of the variance of the residuals and indicates the absolute fit of the model to the data (difference between observed data to model's predicted values). “RMSE can be interpreted as the standard deviation of the unexplained variance, and has the useful property of being in the same units as the response variable. Lower values of RMSE indicate better fit. RMSE is a good measure of how accurately the model predicts the response, and is the most important criterion for fit if the main purpose of the model is prediction.” (Grace-Martin K: Assessing the Fit of Regression Models)

The normalized RMSE is the proportion of the RMSE related to the range of the response variable. Hence, lower values indicate less residual variance.

Residual Standard Error

The residual standard error is the square root of the residual sum of squares divided by the residual degrees of freedom.

Mean Square Error

The mean square error is the mean of the sum of squared residuals, i.e. it measures the average of the squares of the errors. Lower values (closer to zero) indicate better fit.

Coefficient of Variation

The advantage of the cv is that it is unitless. This allows coefficient of variation to be compared to each other in ways that other measures, like standard deviations or root mean squared residuals, cannot be.

“It is interesting to note the differences between a model's CV and R-squared values. Both are unitless measures that are indicative of model fit, but they define model fit in two different ways: CV evaluates the relative closeness of the predictions to the actual values while R-squared evaluates how much of the variability in the actual values is explained by the model.” (source: UCLA-FAQ)

Error Rate

The error rate is a crude measure for model fit for logistic regression models. It is defined as the proportion of cases for which the deterministic prediction is wrong, i.e. the proportion where the the predicted probability is above 0.5, although y = 0 (and vice versa). In general, the error rate should be below 0.5 (i.e. 50%), the closer to zero, the better. Furthermore, the error rate of the full model should be considerably below the null model's error rate (cf. Gelman and Hill 2007, pp. 99). The print()-method also prints the results from the Likelihood-Ratio-Test, comparing the full to the null model.

Binned Residuals

Binned residual plots are achieved by “dividing the data into categories (bins) based on their fitted values, and then plotting the average residual versus the average fitted value for each bin.” (Gelman, Hill 2007: 97). If the model were true, one would expect about 95% of the residuals to fall inside the error bounds.

If term is not NULL, one can compare the residuals in relation to a specific model predictor. This may be helpful to check if a term would fit better when transformed, e.g. a rising and falling pattern of residuals along the x-axis (the pattern is indicated by a green line) is a signal to consider taking the logarithm of the predictor (cf. Gelman and Hill 2007, pp. 97ff).

Chi-squared Goodness-of-Fit Test

For vectors, this function is a convenient function for the chisq.test(), performing goodness-of-fit test. For glm-objects, this function performs a goodness-of-fit test. A well-fitting model shows no significant difference between the model and the observed data, i.e. the reported p-values should be greater than 0.05.

Hosmer-Lemeshow Goodness-of-Fit Test

A well-fitting model shows no significant difference between the model and the observed data, i.e. the reported p-value should be greater than 0.05.

References

Gelman A, Hill J (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge, New York: Cambridge University Press

Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, UK New York: Cambridge University Press

Hosmer, D. W., & Lemeshow, S. (2000). Applied Logistic Regression. Hoboken, NJ, USA: John Wiley & Sons, Inc. 10.1002/0471722146

Grace-Martin K: Assessing the Fit of Regression Models

Examples

Run this code

# NOT RUN {
data(efc)
fit <- lm(barthtot ~ c160age + c12hour, data = efc)
rmse(fit)
rse(fit)
cv(fit)

library(lme4)
fit <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
rmse(fit)
mse(fit)
cv(fit)

# normalized RMSE
library(nlme)
fit <- lme(distance ~ age, data = Orthodont)
rmse(fit, normalized = TRUE)

#coefficient of variation for variable
cv(efc$e17age)

# Error Rate
efc$neg_c_7d <- ifelse(efc$neg_c_7 < median(efc$neg_c_7, na.rm = TRUE), 0, 1)
m <- glm(
  neg_c_7d ~ c161sex + barthtot + c172code,
  data = efc,
  family = binomial(link = "logit")
)
error_rate(m)

# Binned residuals
binned_resid(m)
binned_resid(m, "barthtot")

# goodness-of-fit test for logistic regression
chisq_gof(m)

# goodness-of-fit test for logistic regression
hoslem_gof(m)

# goodness-of-fit test for vectors against probabilities
# differing from population
chisq_gof(efc$e42dep, c(0.3,0.2,0.22,0.28))
# equal to population
chisq_gof(efc$e42dep, prop.table(table(efc$e42dep)))


# }

Run the code above in your browser using DataLab