# For example, the following will run KDE using the data in "ref_data" for
# training and the data in "qu_data" as query data. It will apply an
# Epanechnikov kernel with a 0.2 bandwidth to each reference point and use a
# KD-Tree for the dual-tree optimization. The returned predictions will be
# within 5% of the real KDE value for each query point.
if (FALSE) {
output <- kde(reference=ref_data, query=qu_data, bandwidth=0.2,
kernel="epanechnikov", tree="kd-tree", rel_error=0.05)
out_data <- output$predictions
}
# the predicted density estimations will be stored in "out_data".
# If no "query" is provided, then KDE will be computed on the "reference"
# dataset.
# It is possible to select either a reference dataset or an input model but
# not both at the same time. If an input model is selected and parameter
# values are not set (e.g. "bandwidth") then default parameter values will be
# used.
#
# In addition to the last program call, it is also possible to activate Monte
# Carlo estimations if a Gaussian kernel is used. This can provide faster
# results, but the KDE will only have a probabilistic guarantee of meeting
# the desired error bound (instead of an absolute guarantee). The following
# example will run KDE using a Monte Carlo estimation when possible. The
# results will be within a 5% of the real KDE value with a 95% probability.
# Initial sample size for the Monte Carlo estimation will be 200 points and a
# node will be a candidate for the estimation only when it contains 700 (i.e.
# 3.5*200) points. If a node contains 700 points and 420 (i.e. 0.6*700) have
# already been sampled, then the algorithm will recurse instead of keep
# sampling.
if (FALSE) {
output <- kde(reference=ref_data, query=qu_data, bandwidth=0.2,
kernel="gaussian", tree="kd-tree", rel_error=0.05, monte_carlo=,
mc_probability=0.95, initial_sample_size=200, mc_entry_coef=3.5,
mc_break_coef=0.6)
out_data <- output$predictions
}
Run the code above in your browser using DataLab