A simple index of "information" content associated with individuals in a SoilProfileCollection
object. Information content is quantified by number of bytes after gzip compression via memCompress()
.
profileInformationIndex(
x,
vars,
method = c("median", "mean", "sum"),
baseline = TRUE,
useDepths = TRUE,
numericDigits = 4
)
a numeric vector of the same length as length(x)
and in the same order, suitable for direct assignment to a new site-level attribute
SoilProfileCollection
object
character vector of site or horizon level attributes to consider
character: aggregation method, information content evaluated over vars
: 'median', 'mean', or 'sum'
logical, compute ratio to "baseline" information content, see details
logical, include horizon depths in vars
integer, number of significant digits to retain in numeric -> character conversion
D.E. Beaudette
Information content via compression (gzip) is the central assumption behind this function: the values associated with a simple soil profile having few horizons and little variation between horizons (isotropic depth-functions) will compress to a much smaller size than a complex profile (many horizons, strong anisotropy). Information content is evaluated a profile at a time, over each site or horizon level attribute specified in vars
. Values are aggregated to the profile level by method
: median, mean, or sum. The baseline
argument invokes a comparison to the simplest possible representation of each depth-function:
numeric
: replication of the mean value to match the number of horizons with non-NA values
character
or factor
: replication of the most frequent value to match the number of horizons with non-NA values
The ratios computed against a "simple" baseline represent something like "information gain", ranging from 0 to 1. Larger baseline ratios suggest more complexity (more information) associated with a soil profile's depth-functions. Alternatively, the total quantity of information (in bytes) can be determined by setting baseline = FALSE
and method = 'sum'
.