Learn R Programming

posterior (version 1.6.0)

entropy: Normalized entropy

Description

Normalized entropy, for measuring dispersion in draws from categorical distributions.

Usage

entropy(x)

# S3 method for default entropy(x)

# S3 method for rvar entropy(x)

Value

If x is a factor or numeric, returns a length-1 numeric vector with a value between 0 and 1 (inclusive) giving the normalized Shannon entropy of x.

If x is an rvar, returns an array of the same shape as x, where each cell is the normalized Shannon entropy of the draws in the corresponding cell of x.

Arguments

x

(multiple options) A vector to be interpreted as draws from a categorical distribution, such as:

Details

Calculates the normalized Shannon entropy of the draws in x. This value is the entropy of x divided by the maximum entropy of a distribution with n categories, where n is length(unique(x)) for numeric vectors and length(levels(x)) for factors:

$$-\frac{\sum_{i = 1}^{n} p_i \log(p_i)}{\log(n)}$$

This scales the output to be between 0 (all probability in one category) and 1 (uniform). This form of normalized entropy is referred to as \(H_\mathrm{REL}\) in Wilcox (1967).

References

Allen R. Wilcox (1967). Indices of Qualitative Variation (No. ORNL-TM-1919). Oak Ridge National Lab., Tenn.

Examples

Run this code
set.seed(1234)

levels <- c("a", "b", "c", "d", "e")

# a uniform distribution: high normalized entropy
x <- factor(
  sample(levels, 4000, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2)),
  levels = levels
)
entropy(x)

# a unimodal distribution: low normalized entropy
y <- factor(
  sample(levels, 4000, replace = TRUE, prob = c(0.95, 0.02, 0.015, 0.01, 0.005)),
  levels = levels
)
entropy(y)

# both together, as an rvar
xy <- c(rvar(x), rvar(y))
xy
entropy(xy)

Run the code above in your browser using DataLab