pareto_smooth: Pareto smoothing

Description

Smooth the tail draws of x by replacing tail draws by order statistics of a generalized Pareto distribution fit to the tail(s). For further details see Vehtari et al. (2024).

Usage

pareto_smooth(x, ...)
# S3 method for rvar
pareto_smooth(x, return_k = FALSE, extra_diags = FALSE, ...)
# S3 method for default
pareto_smooth(
  x,
  tail = c("both", "right", "left"),
  r_eff = NULL,
  ndraws_tail = NULL,
  return_k = FALSE,
  extra_diags = FALSE,
  verbose = TRUE,
  are_log_weights = FALSE,
  ...
)

Value

Either a vector x of smoothed values or a named list containing the vector x and a named list diagnostics

containing numeric values:

khat: estimated Pareto k shape parameter, and optionally
min_ss: minimum sample size for reliable Pareto smoothed estimate
khat_threshold: sample size specific khat threshold for reliable Pareto smoothed estimates
convergence_rate: Relative convergence rate for Pareto smoothed estimates

If any of the draws is non-finite, that is, NA, NaN, Inf, or -Inf, Pareto smoothing will not be performed, and the original draws will be returned and and diagnostics will be NA (numeric).

Arguments

x

(multiple options) One of:

A matrix of draws for a single variable (iterations x chains). See extract_variable_matrix().
An rvar.

...

Arguments passed to individual methods (if applicable).

return_k

(logical) Should the Pareto khat be included in output? If TRUE, output will be a list containing smoothed draws and diagnostics, otherwise it will be a numeric of the smoothed draws. Default is FALSE.

extra_diags

(logical) Should extra Pareto khat diagnostics be included in output? If TRUE, min_ss, khat_threshold and convergence_rate for the estimated k value will be returned. Default is FALSE.

tail

(string) The tail to diagnose/smooth:

"right": diagnose/smooth only the right (upper) tail
"left": diagnose/smooth only the left (lower) tail
"both": diagnose/smooth both tails and return the maximum k-hat value

The default is "both".

r_eff

(numeric) relative effective sample size estimate. If r_eff is NULL, it will be calculated assuming the draws are from MCMC. Default is NULL.

ndraws_tail

(numeric) number of draws for the tail. If ndraws_tail is not specified, it will be calculated as ceiling(3 * sqrt(length(x) / r_eff)) if length(x) > 225 and length(x) / 5 otherwise (see Appendix H in Vehtari et al. (2024)).

verbose

(logical) Should diagnostic messages be printed? If TRUE, messages related to Pareto diagnostics will be printed. Default is FALSE.

are_log_weights

(logical) Are the draws log weights? Default is FALSE. If TRUE computation will take into account that the draws are log weights, and only right tail will be smoothed.

References

Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao and Jonah Gabry (2024). Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72):1-58. PDF

Examples

Run this code

mu <- extract_variable_matrix(example_draws(), "mu")
pareto_smooth(mu)

d <- as_draws_rvars(example_draws("multi_normal"))
pareto_smooth(d$Sigma)