hdi: Highest Density Interval (HDI)

Description

Compute the Highest Density Interval (HDI) of a posterior distribution, i.e., all points within the interval have a higher probability density than points outside the interval. The HDI can be used in the context of Bayesian posterior characterisation as Credible Interval (CI).

Usage

hdi(x, ...)
# S3 method for numeric
hdi(x, ci = 0.89, verbose = TRUE, ...)
# S3 method for data.frame
hdi(x, ci = 0.89, verbose = TRUE, ...)
# S3 method for stanreg
hdi(x, ci = 0.89, effects = c("fixed", "random",
  "all"), parameters = NULL, verbose = TRUE, ...)
# S3 method for brmsfit
hdi(x, ci = 0.89, effects = c("fixed", "random",
  "all"), component = c("conditional", "zi", "zero_inflated", "all"),
  parameters = NULL, verbose = TRUE, ...)
# S3 method for BFBayesFactor
hdi(x, ci = 0.89, verbose = TRUE, ...)

Arguments

Vector representing a posterior distribution. Can also be a stanreg or brmsfit model.

...

Currently not used.

Value or vector of probability of the interval (between 0 and 1) to be estimated. Named Credible Interval (CI) for consistency.

verbose

Toggle off warnings.

effects

Should results for fixed effects, random effects or both be returned? Only applies to mixed models. May be abbreviated.

parameters

Regular expression pattern that describes the parameters that should be returned. Meta-parameters (like lp__ or prior_) are filtered by default, so only parameters that typically appear in the summary() are returned. Use parameters to select specific parameters for the output.

component

Should results for all parameters, parameters for the conditional model or the zero-inflated part of the model be returned? May be abbreviated. Only applies to brms-models.

Value

A data frame with following columns:

Parameter The model parameter(s), if x is a model-object. If x is a vector, this column is missing.
CI The probability of the HDI.
CI_low , CI_high The lower and upper HDI limits for the parameters.

Details

Unlike equal-tailed intervals (see ci) that typically exclude 2.5% from each tail of the distribution, the HDI is not equal-tailed and therefore always includes the mode(s) of posterior distributions.

By default, hdi() returns the 89% intervals (ci = 0.89), deemed to be more stable than, for instance, 95% intervals (Kruschke, 2014). An effective sample size of at least 10.000 is recommended if 95% intervals should be computed (Kruschke, 2014, p. 183ff). Moreover, 89 is the highest prime number that does not exceed the already unstable 95% threshold. What does it have to do with anything? Nothing, but it reminds us of the total arbitrarity of any of these conventions (McElreath, 2015).

References

Kruschke, J. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press.
McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC.

Examples

Run this code

# NOT RUN {
library(bayestestR)

posterior <- rnorm(1000)
hdi(posterior, ci = .89)
hdi(posterior, ci = c(.80, .90, .95))

df <- data.frame(replicate(4, rnorm(100)))
hdi(df)
hdi(df, ci = c(.80, .90, .95))

library(rstanarm)
model <- stan_glm(mpg ~ wt + gear, data = mtcars, chains = 2, iter = 200)
hdi(model)
hdi(model, ci = c(.80, .90, .95))

# }
# NOT RUN {
library(brms)
model <- brms::brm(mpg ~ wt + cyl, data = mtcars)
hdi(model)
hdi(model, ci = c(.80, .90, .95))

library(BayesFactor)
bf <- ttestBF(x = rnorm(100, 1, 1))
hdi(bf)
hdi(bf, ci = c(.80, .90, .95))
# }
# NOT RUN {
# }

Run the code above in your browser using DataLab