Print a diagnostic table summarizing the estimated Pareto shape parameters
and PSIS effective sample sizes, find the indexes of observations for which
the estimated Pareto shape parameter \(k\) is larger than some
threshold
value, or plot observation indexes vs. diagnostic estimates.
The Details section below provides a brief overview of the
diagnostics, but we recommend consulting Vehtari, Gelman, and Gabry (2017)
and Vehtari, Simpson, Gelman, Yao, and Gabry (2019) for full details.
pareto_k_table(x)pareto_k_ids(x, threshold = 0.5)
pareto_k_values(x)
pareto_k_influence_values(x)
psis_n_eff_values(x)
mcse_loo(x, threshold = 0.7)
# S3 method for psis_loo
plot(
x,
diagnostic = c("k", "n_eff"),
...,
label_points = FALSE,
main = "PSIS diagnostic plot"
)
# S3 method for psis
plot(
x,
diagnostic = c("k", "n_eff"),
...,
label_points = FALSE,
main = "PSIS diagnostic plot"
)
pareto_k_table()
returns an object of class
"pareto_k_table"
, which is a matrix with columns "Count"
,
"Proportion"
, and "Min. n_eff"
, and has its own print method.
pareto_k_ids()
returns an integer vector indicating which
observations have Pareto \(k\) estimates above threshold
.
pareto_k_values()
returns a vector of the estimated Pareto
\(k\) parameters. These represent the reliability of sampling.
pareto_k_influence_values()
returns a vector of the estimated Pareto
\(k\) parameters. These represent influence of the observations on the
model posterior distribution.
psis_n_eff_values()
returns a vector of the estimated PSIS
effective sample sizes.
mcse_loo()
returns the Monte Carlo standard error (MCSE)
estimate for PSIS-LOO. MCSE will be NA if any Pareto \(k\) values are
above threshold
.
The plot()
method is called for its side effect and does not
return anything. If x
is the result of a call to loo()
or psis()
then plot(x, diagnostic)
produces a plot of
the estimates of the Pareto shape parameters (diagnostic = "k"
) or
estimates of the PSIS effective sample sizes (diagnostic = "n_eff"
).
An object created by loo()
or psis()
.
For pareto_k_ids()
, threshold
is the minimum \(k\)
value to flag (default is 0.5
). For mcse_loo()
, if any \(k\)
estimates are greater than threshold
the MCSE estimate is returned as
NA
(default is 0.7
). See Details for the motivation behind these
defaults.
For the plot
method, which diagnostic should be
plotted? The options are "k"
for Pareto \(k\) estimates (the
default) or "n_eff"
for PSIS effective sample size estimates.
For the plot()
method, if label_points
is
TRUE
the observation numbers corresponding to any values of \(k\)
greater than 0.5 will be displayed in the plot. Any arguments specified in
...
will be passed to graphics::text()
and can be used
to control the appearance of the labels.
For the plot()
method, a title for the plot.
The reliability and approximate convergence rate of the PSIS-based estimates can be assessed using the estimates for the shape parameter \(k\) of the generalized Pareto distribution:
If \(k < 0.5\) then the distribution of raw importance ratios has finite variance and the central limit theorem holds. However, as \(k\) approaches \(0.5\) the RMSE of plain importance sampling (IS) increases significantly while PSIS has lower RMSE.
If \(0.5 \leq k < 1\) then the variance of the raw importance ratios is infinite, but the mean exists. TIS and PSIS estimates have finite variance by accepting some bias. The convergence of the estimate is slower with increasing \(k\). If \(k\) is between 0.5 and approximately 0.7 then we observe practically useful convergence rates and Monte Carlo error estimates with PSIS (the bias of TIS increases faster than the bias of PSIS). If \(k > 0.7\) we observe impractical convergence rates and unreliable Monte Carlo error estimates.
If \(k \geq 1\) then neither the variance nor the mean of the raw importance ratios exists. The convergence rate is close to zero and bias can be large with practical sample sizes.
Importance sampling is likely to work less well if the marginal posterior \(p(\theta^s | y)\) and LOO posterior \(p(\theta^s | y_{-i})\) are very different, which is more likely to happen with a non-robust model and highly influential observations. If the estimated tail shape parameter \(k\) exceeds \(0.5\), the user should be warned. (Note: If \(k\) is greater than \(0.5\) then WAIC is also likely to fail, but WAIC lacks its own diagnostic.) In practice, we have observed good performance for values of \(k\) up to 0.7. When using PSIS in the context of approximate LOO-CV, we recommend one of the following actions when \(k > 0.7\):
With some additional computations, it is possible to transform the MCMC
draws from the posterior distribution to obtain more reliable importance
sampling estimates. This results in a smaller shape parameter \(k\).
See loo_moment_match()
for an example of this.
Sampling directly from \(p(\theta^s | y_{-i})\) for the problematic observations \(i\), or using \(k\)-fold cross-validation will generally be more stable.
Using a model that is more robust to anomalous observations will generally make approximate LOO-CV more stable.
The estimated shape parameter
\(k\) for each observation can be used as a measure of the observation's
influence on posterior distribution of the model. These can be obtained with
pareto_k_influence_values()
.
In the case that we obtain the samples from the proposal distribution via MCMC the loo package also computes estimates for the Monte Carlo error and the effective sample size for importance sampling, which are more accurate for PSIS than for IS and TIS (see Vehtari et al (2019) for details). However, the PSIS effective sample size estimate will be over-optimistic when the estimate of \(k\) is greater than 0.7.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413--1432. doi:10.1007/s11222-016-9696-4 (journal version, preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2019). Pareto smoothed importance sampling. preprint arXiv:1507.02646
psis()
for the implementation of the PSIS algorithm.
The FAQ page on the loo website for answers to frequently asked questions.