This function calculates the Kullback-Leibler divergence (KLD) between two probability distributions, and has many uses, such as in lowest posterior loss probability intervals, posterior predictive checks, prior elicitation, reference priors, and Variational Bayes.
KLD(px, py, base)
This is a required vector of probability densities,
considered as \(p(\textbf{x})\). Log-densities are also
accepted, in which case both px
and py
must be
log-densities.
This is a required vector of probability densities,
considered as \(p(\textbf{y})\). Log-densities are also
accepted, in which case both px
and py
must be
log-densities.
This optional argument specifies the logarithmic base,
which defaults to base=exp(1)
(or \(e\)) and represents
information in natural units (nats), where base=2
represents
information in binary units (bits).
KLD
returns a list with the following components:
This is \(\mathrm{KLD}_i[p(\textbf{x}_i) || p(\textbf{y}_i)]\).
This is \(\mathrm{KLD}_i[p(\textbf{y}_i) || p(\textbf{x}_i)]\).
This is the mean of the two components above. This is
the expected posterior loss in LPL.interval
.
This is \(\mathrm{KLD}[p(\textbf{x}) || p(\textbf{y})]\). This is a directed divergence.
This is \(\mathrm{KLD}[p(\textbf{y}) || p(\textbf{x})]\). This is a directed divergence.
This is the mean of the two components above.
This is minimum of the two directed divergences.
The Kullback-Leibler divergence (KLD) is known by many names, some of which are Kullback-Leibler distance, K-L, and logarithmic divergence. KLD is an asymmetric measure of the difference, distance, or direct divergence between two probability distributions \(p(\textbf{y})\) and \(p(\textbf{x})\) (Kullback and Leibler, 1951). Mathematically, however, KLD is not a distance, because of its asymmetry.
Here, \(p(\textbf{y})\) represents the ``true'' distribution of data, observations, or theoretical distribution, and \(p(\textbf{x})\) represents a theory, model, or approximation of \(p(\textbf{y})\).
For probability distributions \(p(\textbf{y})\) and \(p(\textbf{x})\) that are discrete (whether the underlying distribution is continuous or discrete, the observations themselves are always discrete, such as from \(i=1,\dots,N\)),
$$\mathrm{KLD}[p(\textbf{y}) || p(\textbf{x})] = \sum^N_i p(\textbf{y}_i) \log\frac{p(\textbf{y}_i)}{p(\textbf{x}_i)}$$
In Bayesian inference, KLD can be used as a measure of the information
gain in moving from a prior distribution, \(p(\theta)\),
to a posterior distribution, \(p(\theta | \textbf{y})\). As such, KLD is the basis of reference priors and lowest
posterior loss intervals (LPL.interval
), such as in
Berger, Bernardo, and Sun (2009) and Bernardo (2005). The intrinsic
discrepancy was introduced by Bernardo and Rueda (2002). For more
information on the intrinsic discrepancy, see
LPL.interval
.
Berger, J.O., Bernardo, J.M., and Sun, D. (2009). "The Formal Definition of Reference Priors". The Annals of Statistics, 37(2), p. 905--938.
Bernardo, J.M. and Rueda, R. (2002). "Bayesian Hypothesis Testing: A Reference Approach". International Statistical Review, 70, p. 351--372.
Bernardo, J.M. (2005). "Intrinsic Credible Regions: An Objective Bayesian Approach to Interval Estimation". Sociedad de Estadistica e Investigacion Operativa, 14(2), p. 317--384.
Kullback, S. and Leibler, R.A. (1951). "On Information and Sufficiency". The Annals of Mathematical Statistics, 22(1), p. 79--86.
# NOT RUN {
library(LaplacesDemon)
px <- dnorm(runif(100),0,1)
py <- dnorm(runif(100),0.1,0.9)
KLD(px,py)
# }
Run the code above in your browser using DataLab