KLD: Kullback-Leibler Divergence (KLD)

Description

This function calculates the Kullback-Leibler divergence (KLD) between two probability distributions, and has many uses, such as in lowest posterior loss probability intervals, posterior predictive checks, prior elicitation, reference priors, and Variational Bayes.

Usage

KLD(px, py, base)

Arguments

This is a required vector of probability densities, considered as $p(\textbf{x})$. Log-densities are also accepted, in which case both px and py must be log-densities.

This is a required vector of probability densities, considered as $p(\textbf{y})$. Log-densities are also accepted, in which case both px and py must be log-densities.

base

This optional argument specifies the logarithmic base, which defaults to base=exp(1) (or $e$) and represents information in natural units (nats), where base=2 represents information in binary units (bits).

Value

KLD returns a list with the following components:

KLD.px.py

This is $\mathrm{KLD}_i[p(\textbf{x}_i) || p(\textbf{y}_i)]$.

KLD.py.px

This is $\mathrm{KLD}_i[p(\textbf{y}_i) || p(\textbf{x}_i)]$.

mean.KLD

This is the mean of the two components above. This is the expected posterior loss in LPL.interval.

sum.KLD.px.py

This is $\mathrm{KLD}[p(\textbf{x}) || p(\textbf{y})]$. This is a directed divergence.

sum.KLD.py.px

This is $\mathrm{KLD}[p(\textbf{y}) || p(\textbf{x})]$. This is a directed divergence.

mean.sum.KLD

This is the mean of the two components above.

intrinsic.discrepancy

This is minimum of the two directed divergences.

Details

The Kullback-Leibler divergence (KLD) is known by many names, some of which are Kullback-Leibler distance, K-L, and logarithmic divergence. KLD is an asymmetric measure of the difference, distance, or direct divergence between two probability distributions $p(\textbf{y})$ and $p(\textbf{x})$ (Kullback and Leibler, 1951). Mathematically, however, KLD is not a distance, because of its asymmetry.

Here, $p(\textbf{y})$ represents the ``true'' distribution of data, observations, or theoretical distribution, and $p(\textbf{x})$ represents a theory, model, or approximation of $p(\textbf{y})$.

For probability distributions $p(\textbf{y})$ and $p(\textbf{x})$ that are discrete (whether the underlying distribution is continuous or discrete, the observations themselves are always discrete, such as from $i=1,\dots,N$),

$$\mathrm{KLD}[p(\textbf{y}) || p(\textbf{x})] = \sum^N_i p(\textbf{y}_i) \log\frac{p(\textbf{y}_i)}{p(\textbf{x}_i)}$$

In Bayesian inference, KLD can be used as a measure of the information gain in moving from a prior distribution, $p(\theta)$, to a posterior distribution, $p(\theta | \textbf{y})$. As such, KLD is the basis of reference priors and lowest posterior loss intervals (LPL.interval), such as in Berger, Bernardo, and Sun (2009) and Bernardo (2005). The intrinsic discrepancy was introduced by Bernardo and Rueda (2002). For more information on the intrinsic discrepancy, see LPL.interval.

References

Berger, J.O., Bernardo, J.M., and Sun, D. (2009). "The Formal Definition of Reference Priors". The Annals of Statistics, 37(2), p. 905--938.

Bernardo, J.M. and Rueda, R. (2002). "Bayesian Hypothesis Testing: A Reference Approach". International Statistical Review, 70, p. 351--372.

Bernardo, J.M. (2005). "Intrinsic Credible Regions: An Objective Bayesian Approach to Interval Estimation". Sociedad de Estadistica e Investigacion Operativa, 14(2), p. 317--384.

Kullback, S. and Leibler, R.A. (1951). "On Information and Sufficiency". The Annals of Mathematical Statistics, 22(1), p. 79--86.

Examples

Run this code

# NOT RUN {
library(LaplacesDemon)
px <- dnorm(runif(100),0,1)
py <- dnorm(runif(100),0.1,0.9)
KLD(px,py)
# }

Run the code above in your browser using DataLab