kumar_johnson: Kumar-Johnson distance (lowlevel function)

Description

The lowlevel function for computing the kumar_johnson distance.

kumar_johnson(P, Q, testNA, epsilon)

P: a numeric vector storing the first distribution.
Q: a numeric vector storing the second distribution.
testNA: a logical value indicating whether or not distributions shall be checked for NA values.
epsilon: epsilon a small value to address cases in the distance computation where division by zero occurs. In these cases, x / 0 or 0 / 0 will be replaced by epsilon. The default is epsilon = 0.00001. However, we recommend to choose a custom epsilon value depending on the size of the input vectors, the expected similarity between compared probability density functions and whether or not many 0 values are present within the compared vectors. As a rough rule of thumb we suggest that when dealing with very large input vectors which are very similar and contain many 0 values, the epsilon value should be set even smaller (e.g. epsilon = 0.000000001), whereas when vector sizes are small or distributions very divergent then higher epsilon values may also be appropriate (e.g. epsilon = 0.01). Addressing this epsilon issue is important to avoid cases where distance metrics return negative values which are not defined and only occur due to the technical issues of computing x / 0 or 0 / 0 cases.

Hajk-Georg Drost

kumar_johnson(P = 1:10/sum(1:10), Q = 20:29/sum(20:29),
 testNA = FALSE, epsilon = 0.00001)

Run the code above in your browser using DataLab