Constructs a hypervolume by building a Gaussian kernel density estimate on an adaptive grid of random points wrapping around the original data points. The bandwidth vector reflects the axis-aligned standard deviations of a hyperelliptical kernel.
Because Gaussian kernel density estimates do not decay to zero in a finite distance, the algorithm evaluates the kernel density in hyperelliptical regions out to a distance set by sd.count
.
After delineating the probability density, the function calls hypervolume_threshold
to determine a boundary. The defaullt behavior ensures that 95 percent of the stimated probability density is enclosed by the chosen boundary. However note that theaccuracy of the total probability density depends on having set a large value of sd.count
.
Most use cases should not require modification of any parameters except kde.bandwidth
.
Optionally, weighting of the data (e.g. for abundance-weighting) is possible. By default, the function estimates the probability density of the observations via Gaussian kernel functions, assuming each data point contributes equally. By setting a weight
parameter, the algorithm can instead take a weighted average the kernel functions centered on each observation. Code for weighting data written by Yuanzhi Li (Yuanzhi.Li@usherbrooke.ca).
hypervolume_gaussian(data, name = NULL,
weight = NULL,
samples.per.point = ceiling((10^(3 + sqrt(ncol(data))))/nrow(data)),
kde.bandwidth = estimate_bandwidth(data),
sd.count = 3,
quantile.requested = 0.95,
quantile.requested.type = "probability",
chunk.size = 1000,
verbose = TRUE,
...)
A Hypervolume-class
object corresponding to the inferred hypervolume.
A m x n matrix or data frame, where m is the number of observations and n is the dimensionality.
A string to assign to the hypervolume for later output and plotting. Defaults to the name of the variable if NULL.
An optional vector of weights for the kernel density estimation. Defaults to even weighting (rep(1/nrow(data),nrow(data))
) if NULL
.
Number of random points to be evaluated per data point in data
.
A bandwidth vector obtained by running estimate_bandwidth
Note that previous package version (<3.0.0) allowed inputting a scalar/vector value here - this is now handled through the estimate_bandwidth
interface.
The number of standard deviations (converted to actual units by multiplying by kde.bandwidth
) at which the 'edge' of the hypervolume should be evaluated. Larger values of threshold.sd.count
will come closer to a true estimate of the Gaussian density over a larger region of hyperspace, but require rapidly increasing computational resources (see Details section). It is generally better to use a large/default value for this parameter. Warnings will be generated if chosen to take a value less than 3.
The quantile value used to delineate the boundary of the kernel density estimate. See hypervolume_threshold
.
The type of quantile (volume or probability) used for the boundary delineation. See hypervolume_threshold
.
Number of random points to process per internal step. Larger values may have better performance on machines with large amounts of free memory. Changing this parameter does not change the output of the function; only how this output is internally assembled.
Logical value; print diagnostic output if TRUE
.
Other arguments to pass to hypervolume_threshold
hypervolume_threshold
data(penguins,package='palmerpenguins')
penguins_no_na = as.data.frame(na.omit(penguins))
penguins_adelie = penguins_no_na[penguins_no_na$species=="Adelie",
c("bill_length_mm","bill_depth_mm","flipper_length_mm")]
# low samples per point for CRAN demo
hv = hypervolume_gaussian(penguins_adelie,name='Adelie',samples.per.point=100)
summary(hv)
Run the code above in your browser using DataLab