A general function to compute several different information theory metrics
information(
data,
base = 2.718282,
bins = floor(sqrt(nrow(data)/5)),
statistic = c("entropy", "joint.entropy", "conditional.entropy", "total.correlation",
"dual.total.correlation", "o.information")
)
Returns list containing only requested statistic
Matrix or data frame. Should consist only of variables to be used in the analysis
Numeric (length = 1). Base of logarithm to use for entropy. Common options include:
2
--- bits
2.718282
--- nats
10
--- bans
Defaults to exp(1)
or 2.718282
Numeric (length = 1).
Number of bins if data are not discrete.
Defaults to floor(sqrt(nrow(data) / 5))
Character. Information theory statistics to compute. Available options:
"entropy"
--- Shannon's entropy (Shannon, 1948) for each variable in data
.
Values range from 0
to log(k)
where k
is the number of categories for the variable
"joint.entropy"
--- shared uncertainty over all variables in data
.
Values range from the maximum of the individual entropies to the sum of individual entropies
"conditional.entropy"
--- uncertainty remaining after considering all other
variables in data
. Values range from 0
to the individual entropy of the
conditioned variable
"total.correlation"
--- generalization of mutual information to more than
two variables (Watanabe, 1960). Quantifies the redundancy of information in data
.
Values range from 0
to the sum of individual entropies minus the maximum of the
individual entropies
"dual.total.correlation"
--- "shared randomness" or total uncertainty remaining in
the data
(Han, 1978). Values range from 0
to joint entropy
"o.information"
--- quantifies the extent to which the data
is represented
by lower-order (> 0
; redundancy) or higher-order (< 0
; synergy) constraint
(Crutchfield, 1994)
By default, all statistics are computed
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen@gmail.com>
Shannon's entropy
Shannon, C. E. (1948). A mathematical theory of communication.
The Bell System Technical Journal, 27(3), 379-423.
Formalization of total correlation
Watanabe, S. (1960).
Information theoretical analysis of multivariate correlation.
IBM Journal of Research and Development 4, 66-82.
Applied implementation of total correlation
Felix, L. M., Mansur-Alves, M., Teles, M., Jamison, L., & Golino, H. (2021).
Longitudinal impact and effects of booster sessions in a cognitive training program for healthy older adults.
Archives of Gerontology and Geriatrics, 94, 104337.
Formalization of dual total correlation
Te Sun, H. (1978).
Nonnegative entropy measures of multivariate symmetric correlations.
Information and Control, 36, 133-156.
Formalization of O-information
Crutchfield, J. P. (1994). The calculi of emergence: Computation, dynamics and induction.
Physica D: Nonlinear Phenomena, 75(1-3), 11-54.
Applied implementation of O-information
Marinazzo, D., Van Roozendaal, J., Rosas, F. E., Stella, M., Comolatti, R., Colenbier, N., Stramaglia, S., & Rosseel, Y. (2024).
An information-theoretic approach to build hypergraphs in psychometrics.
Behavior Research Methods, 1-23.
# All measures
information(wmt2[,7:24])
# One measures
information(wmt2[,7:24], statistic = "joint.entropy")
Run the code above in your browser using DataLab