The lowlevel function for computing the neyman_chi_sq distance.
Usage
neyman_chi_sq(P, Q, testNA, epsilon)
Arguments
P
a numeric vector storing the first distribution.
Q
a numeric vector storing the second distribution.
testNA
a logical value indicating whether or not distributions shall be checked for NA values.
epsilon
epsilon a small value to address cases in the distance computation where division by zero occurs. In
these cases, x / 0 or 0 / 0 will be replaced by epsilon. The default is epsilon = 0.00001.
However, we recommend to choose a custom epsilon value depending on the size of the input vectors,
the expected similarity between compared probability density functions and
whether or not many 0 values are present within the compared vectors.
As a rough rule of thumb we suggest that when dealing with very large
input vectors which are very similar and contain many 0 values,
the epsilon value should be set even smaller (e.g. epsilon = 0.000000001),
whereas when vector sizes are small or distributions very divergent then
higher epsilon values may also be appropriate (e.g. epsilon = 0.01).
Addressing this epsilon issue is important to avoid cases where distance metrics
return negative values which are not defined and only occur due to the
technical issues of computing x / 0 or 0 / 0 cases.