hdi: Compute statistics for MCMC samples

Description

hdi() computes the highest density interval for values from MCMC samples. rope() calculates the proportion of a posterior distribution that lies within a region of practical equivalence. n_eff() calculates the number of effective samples (effective sample size). mcse() returns the Monte Carlo standard error.

Usage

hdi(x, prob = 0.9, trans = NULL, type = c("fixed", "random", "all"))
mcse(x, type = c("fixed", "random", "all"))
n_eff(x, type = c("fixed", "random", "all"))
rope(x, rope, trans = NULL, type = c("fixed", "random", "all"))

Arguments

A stanreg, stanfit, or brmsfit object. For hdi() and rope(), may also be a data frame or a vector of values from a probability distribution (e.g., posterior probabilities from MCMC sampling).

prob

Vector of scalars between 0 and 1, indicating the mass within the credible interval that is to be estimated. See hdi.

trans

Name of a function or character vector naming a function, used to apply transformations on the returned HDI-values resp. (for rope()) on the values of the posterior distribution, before calculating the rope based on the boundaries given in rope. Note that the values in rope are not transformed.

type

For mixed effects models, specify the type of effects that should be returned. type = "fixed" returns fixed effects only, type = "random" the random effects and type = "all" returns both fixed and random effects.

rope

Vector of length two, indicating the lower and upper limit of a range around zero, which indicates the region of practical equivalence. Values of the posterior distribution within this range are considered as being "practically equivalent to zero".

Value

For hdi(), if x is a vector, returns a vector of length two with the lower and upper limit of the HDI; if x is a stanreg, stanfit or brmsfit object, returns a tibble with lower and upper HDI-limits for each predictor. To distinguish multiple HDI values, column names for the HDI get a suffix when prob has more than one element. For rope(), returns a tibble with two columns: the proportion of values from x that are within and outside the boundaries of rope. mcse() and n_eff() return a tibble with two columns: one with the term names and one with the related statistic.

Details

Computation for HDI is based on the code from Kruschke 2015, pp. 727f. For default sampling in Stan (4000 samples), the 90% intervals for HDI are more stable than, for instance, 95% intervals. An effective sample size (see n_eff()) of at least 10.000 is recommended if 95% intervals should be computed (see Kruschke 2015, p. 183ff). The formula for mcse() is from Kruschke 2015, p. 187.

For n_eff(), the ratio of effective number of samples ranges from 0 to 1, and should be close to 1. The closer this ratio comes to zero means that the chains may be inefficient, but possibly still okay.

References

Kruschke JK. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2nd edition. Academic Press, 2015

Examples

Run this code

# NOT RUN {
if (require("rstanarm")) {
  fit <- stan_glm(mpg ~ wt + am, data = mtcars, chains = 1)
  hdi(fit)

  # return multiple intervals
  hdi(fit, prob = c(.5, .7, .9))

  # fit logistic regression model
  fit <- stan_glm(
    vs ~ wt + am,
    data = mtcars,
    family = binomial("logit"),
    chains = 1
  )
  # compute hdi, transform on "odds ratio scale"
  hdi(fit, trans = exp)

  # compute rope, on scale of linear predictor. finds proportion
  # of posterior distribution values between -1 and 1.
  rope(fit, rope = c(-1, 1))

  # compute rope, boundaries as "odds ratios". finds proportion of
  # posterior distribution values, which - after being exponentiated -
  # are between .8 and 1.25 (about -.22 and .22 on linear scale)
  rope(fit, rope = c(.8, 1.25), trans = exp)
}
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab