For models fit using MCMC, compute approximate leave-one-out cross-validation
(LOO, LOOIC) or, less preferably, the Widely Applicable Information Criterion
(WAIC) using the loo package. Exact \(K\)-fold
cross-validation is also available. Compare two or more models using the
compare_models
function. Note: these functions are not
guaranteed to work properly unless the data
argument was specified
when the model was fit.
# S3 method for stanreg
loo(x, ..., k_threshold = NULL)# S3 method for stanreg
waic(x, ...)
kfold(x, K = 10, save_fits = FALSE)
compare_models(..., loos = list())
A fitted model object returned by one of the
rstanarm modeling functions. See stanreg-objects
.
For the loo
method, ...
can be used to pass optional
arguments (e.g. cores
) to psislw
. For
compare_models
, ...
should contain two or more objects
returned by the loo
, kfold
, or waic
method (see the
Examples section, below).
Threshold for flagging estimates of the Pareto shape
parameters \(k\) estimated by loo
. See the How to proceed
when loo
gives warnings section, below, for details.
For kfold
, the number of subsets of equal (if possible) size
into which the data will be randomly partitioned for performing
\(K\)-fold cross-validation. The model is refit K
times, each time
leaving out one of the K
subsets. If K
is equal to the total
number of observations in the data then \(K\)-fold cross-validation is
equivalent to exact leave-one-out cross-validation.
If TRUE
, a component 'fits' is added to the returned
object to store the cross-validated stanreg
objects and the indices of the omitted observations for each fold. Defaults
to FALSE
.
For compare_models
, a list of two or more objects returned
by the loo
, kfold
, or waic
method. This argument can
be used as an alternative to passing these objects via ...
.
The loo
and waic
methods return an object of class
'loo'. See the Value section in loo
and
waic
(from the loo package) for details on the
structure of these objects.
kfold
returns an object with has classes 'kfold' and 'loo'
that has a similar structure as the objects returned by the loo
and
waic
methods.
compare_models
returns a vector or matrix with class
'compare.loo'. See the Comparing models section below for more
details.
The loo
method for stanreg objects
provides an interface to the loo package for
approximate leave-one-out cross-validation (LOO). The LOO Information
Criterion (LOOIC) has the same purpose as the Akaike Information Criterion
(AIC) that is used by frequentists. Both are intended to estimate the
expected log predictive density (ELPD) for a new dataset. However, the AIC
ignores priors and assumes that the posterior distribution is multivariate
normal, whereas the functions from the loo package do not make this
distributional assumption and integrate over uncertainty in the parameters.
This only assumes that any one observation can be omitted without having a
major effect on the posterior distribution, which can be judged using the
diagnostic plot provided by the plot.loo
method and the
warnings provided by the print.loo
method (see the
How to Use the rstanarm Package vignette for an example of this
process).
loo
gives warnings (k_threshold)The k_threshold
argument to the loo
method for rstanarm
models is provided as a possible remedy when the diagnostics reveal
problems stemming from the posterior's sensitivity to particular
observations. Warnings about Pareto \(k\) estimates indicate observations
for which the approximation to LOO is problematic (this is described in
detail in Vehtari, Gelman, and Gabry (2017) and the
loo package documentation). The
k_threshold
argument can be used to set the \(k\) value above
which an observation is flagged. If k_threshold
is not NULL
and there are \(J\) observations with \(k\) estimates above
k_threshold
then when loo
is called it will refit the
original model \(J\) times, each time leaving out one of the \(J\)
problematic observations. The pointwise contributions of these observations
to the total ELPD are then computed directly and substituted for the
previous estimates from these \(J\) observations that are stored in the
object created by loo
.
Note: in the warning messages issued by loo
about large
Pareto \(k\) estimates we recommend setting k_threshold
to at
least \(0.7\). There is a theoretical reason, explained in Vehtari,
Gelman, and Gabry (2017), for setting the threshold to the stricter value
of \(0.5\), but in practice they find that errors in the LOO
approximation start to increase non-negligibly when \(k > 0.7\).
The kfold
function performs exact \(K\)-fold
cross-validation. First the data are randomly partitioned into \(K\)
subsets of equal (or as close to equal as possible) size. Then the model is
refit \(K\) times, each time leaving out one of the K
subsets. If
\(K\) is equal to the total number of observations in the data then
\(K\)-fold cross-validation is equivalent to exact leave-one-out
cross-validation (to which loo
is an efficient approximation). The
compare_models
function is also compatible with the objects returned
by kfold
.
compare_models
is a wrapper around the
compare
function in the loo package. Before
calling compare
, compare_models
performs some extra checks to
make sure the rstanarm models are suitable for comparison. These
extra checks include verifying that all models to be compared were fit
using the same outcome variable and likelihood family.
If exactly two models are being compared then compare_models
returns
a vector containing the difference in expected log predictive density
(ELPD) between the models and the standard error of that difference (the
documentation for compare
has additional details about
the calculation of the standard error of the difference). The difference in
ELPD will be negative if the expected out-of-sample predictive accuracy of
the first model is higher. If the difference is be positive then the second
model is preferred.
If more than two models are being compared then compare_models
returns a matrix with one row per model. This matrix summarizes the objects
and arranges them in descending order according to expected out-of-sample
predictive accuracy. That is, the first row of the matrix will be
for the model with the largest ELPD (smallest LOOIC).
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413--1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: http://arxiv.org/abs/1507.04544/
The various rstanarm vignettes for more examples of
using loo
and compare_models
.
loo-package
(in particular the PSIS-LOO
section) for details on the computations implemented by the loo
package and the interpretation of the Pareto \(k\) estimates displayed
when using the plot.loo
method.
log_lik.stanreg
to directly access the pointwise
log-likelihood matrix.
# NOT RUN {
fit1 <- stan_glm(mpg ~ wt, data = mtcars)
fit2 <- stan_glm(mpg ~ wt + cyl, data = mtcars)
# compare on LOOIC
(loo1 <- loo(fit1, cores = 2))
loo2 <- loo(fit2, cores = 2)
plot(loo2)
# when comparing exactly two models, the reported 'elpd_diff' will be
# positive if the expected predictive accuracy for the second model is higher
compare_models(loo1, loo2) # or compare_models(loos = list(loo1, loo2))
# when comparing three or more models they are ordered by expected
# predictive accuracy
fit3 <- stan_glm(mpg ~ ., data = mtcars)
loo3 <- loo(fit3, k_threshold = 0.7, cores = 2)
compare_models(loo1, loo2, loo3)
# 10-fold cross-validation
(kfold1 <- kfold(fit1, K = 10))
kfold2 <- kfold(fit2, K = 10)
compare_models(kfold1, kfold2)
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab