Various approaches to the calculation of a goodness of fit measure
for GLMs analogous to R-squared in the ordinary linear model have been
proposed. Generally termed 'pseudo R-squared' measures, they include
variance-based, likelihood-based, and distribution-specific approaches.
Here however, a more straightforward definition is used, which can be
applied to any model for which fitted values of the response variable are
generated: R-squared is calculated as the squared (weighted) correlation
between the observed and fitted values of the response (in the original
units). This is simply the squared version of the correlation measure
advocated by Zheng & Agresti (2000), itself an intuitive measure of
goodness of fit describing the predictive power of a model. As the measure
does not depend on any specific error distribution or model estimating
procedure, it is also generally comparable across many different types of
model (Kvalseth, 1985). In the case of the ordinary linear model, the
measure is exactly equal to the traditional R-squared based on sums of
squares.
If adj = TRUE
(default), the 'adjusted' R-squared value is also returned,
which provides an estimate of the population – as opposed to sample –
R-squared. This is achieved via an analytical formula which adjusts
R-squared using the 'degrees of freedom' of the model (i.e. the ratio of
observations to parameters), helping to counter multiple R-squared's
positive bias and guard against overfitting of the model to noise in the
original sample. By default, this is calculated via the exact 'Olkin-Pratt'
estimator, shown in recent simulations to be the optimal unbiased
population R-squared estimator across a range of estimators and
specification scenarios (Karch, 2020), and thus a good general first
choice, even for smaller sample sizes. Setting adj.type = "ezekiel"
however will use the simpler and more common 'Ezekiel' formula, which can
be more appropriate where minimising the mean squared error (MSE) of the
estimate is more important than strict unbiasedness (Hittner, 2019; Karch,
2020).
If pred = TRUE
(default), a 'predicted' R-squared is also returned, which
is calculated via the same formula as for R-squared but using
cross-validated, rather than original, fitted values. These are obtained by
dividing the model residuals (in the response scale) by the complement of
the observation leverages (diagonals of the hat matrix), then subtracting
these inflated 'predicted' residuals from the response variable. This is
essentially a short cut to obtaining 'out-of-sample' predictions, normally
arising via a 'leave-one-out' cross-validation procedure (they are not
exactly equal for GLMs). The resulting R-squared is an estimate of the
R-squared that would result were the model fit to new data, and will be
lower than the original – and likely also the adjusted – R-squared,
highlighting the loss of explanatory power due to sample noise. Predicted
R-squared may be a more powerful and general indicator of overfitting than adjusted R-squared,
as it provides a true out-of-sample test. This measure is a variant of an
existing one,
calculated by substituting the 'PRESS' statistic, i.e. the sum of squares
of the predicted residuals (Allen, 1974), for the residual sum of squares
in the classic R-squared formula. It is not calculated here for GLMMs, as
the interpretation of the hat matrix is not reliable (see
lme4:::hatvalues.merMod()
).
For models fitted with one or more offsets, these will be removed by
default from the response variable and fitted values prior to calculations.
Thus R-squared will measure goodness of fit only for the predictors with
estimated, rather than fixed, coefficients. This is likely to be the
intended behaviour in most circumstances, though if users wish to include
variation due to the offset(s) they can set offset = TRUE
.
For mixed models, the function will, by default, calculate all R-squared
measures using fitted values incorporating both the fixed and random
effects, thus encompassing all variation captured by the model. This is
equivalent to the 'conditional' R-squared of Nakagawa et al. (2017) (though
see that reference for a more advanced approach to R-squared for mixed
models). To include only some or no random effects, simply set the
appropriate formula using the argument re.form
, which is passed directly
to lme4:::predict.merMod()
. If re.form = NA
, R-squared is calculated
for the fixed effects only, i.e. the 'marginal' R-squared of Nakagawa et
al. (2017).
As R-squared is calculated here as a squared correlation, the type
of
correlation coefficient can also be specified. Setting this to "spearman"
may be desirable in some cases, such as where the relationship between
response and fitted values is not necessarily bivariate normal or linear,
and a correlation of the ranks may be more informative and/or general. This
purely monotonic R-squared can also be considered a useful goodness of fit measure,
and may be more appropriate for comparisons between GLMs and ordinary
linear models in some scenarios.
R-squared values produced by this function will by default be in the range
0-1, meaning that any negative values arising from calculations will be
converted to zero. Negative values essentially mean that the fit is 'worse'
than the null expectation of no relationship between the variables, which
can be difficult to interpret in practice and in any case usually only
occurs in rare situations, such as where the intercept is suppressed or
where a low value of R-squared is adjusted downwards via an analytic
estimator. Such values are also 'impossible' in practice, given that
R-squared is a strictly positive measure (as generally known). Hence, for
simplicity and ease of interpretation, values less than zero are presented
as a complete lack of model fit. This is also recommended by Shieh (2008),
who shows for adjusted R-squared that such 'positive-part' estimators have
lower MSE in estimating the population R-squared (though higher bias). To
allow return of negative values however, set positive.only = FALSE
. This
may be desirable for simulation purposes, and/or where strict unbiasedness
is prioritised.