Learn R Programming

pesticides (version 0.1)

cdfDist: Distance measure for cumulative distribution functions

Description

This distance measure is useful in assessing the dissimilarity in two cumulative distribution functions, if differences in the right tail are of particular interest.

Usage

cdfDist(x1, F1, x2, F2)

Arguments

x1
A vector of numerical values.
F1
A vector of numerical values, where the i-th elementh of F1 is the CDF at value x1[i].
x2
A vector of numerical values.
F2
A vector of numerical values, where the i-th elementh of F2 is the CDF at value x2[i].

Value

  • The output is a list of class "cdfDist":
  • xThe values at which the pointwise distance was computed and then integrated over.
  • F1The first CDF for each value of x.
  • F2The second CDF for each value of x.
  • measA vector representing the integral of the pointwise distance from x[1] up to each value of x. Plotting this measure with x makes it easy to see where the distance grew the fastest between the CDFs.
  • cdfDistThe distance between the CDFs.

Details

This function first computes a pointwise distance at each value x as $$D(x) = (F1(x) - F2(x))^2 / (1 - min(F1(x), F2(x)))$$ The measure is equal to the integral of this distance over the intersection of the provided quantiles of the two CDFs, a region $(m1, m2)$. Finally, the measure is standardized by the distane of this range: $$\mu(F1, F2) = Int_m1^m2 D(x) dx / (m2-m1)$$ This measure was designed to penalize heavily if the right tails of the distributions were very dissimilar. A poor match in the lower tail results in only a slight increase of the measure.

The functions print, plot, and summary may be applied to the output of cdfDist.

See Also

cor2icc, apple, peach, pear, pepper

Examples

Run this code
par(mfrow=c(2,2))

#=====> Example 1 <=====#
F1   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x1   <- quantile(rt(10000, 15), F1)
F2   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x2   <- qnorm(F2)
hold <- cdfDist(x1, F1, x2, F2)
plot(hold)
summary(hold)

#=====> Example 2 <=====#
F1   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x1   <- exp(quantile(rnorm(10000, 1, sd=1), F1))
F2   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x2   <- qchisq(F2, mean(x1))
hold <- cdfDist(x1, F1, x2, F2)
plot(hold)
summary(hold)

#=====> Example 3 <=====#
F1   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x1   <- exp(quantile(rnorm(10000, 0.5, sd=0.5), F1))
F2   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x2   <- qchisq(F2, mean(x1))
hold <- cdfDist(x1, F1, x2, F2)
plot(hold)
summary(hold)

#=====> Example 4 <=====#
F1   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x1   <- exp(quantile(rnorm(10000, 0.5, sd=0.5), F1))
F2   <- seq(0.001, 0.999, 0.001)[-sample(999, 300)]
x2   <- qchisq(F2, mean(x1)+1)
hold <- cdfDist(x1, F1, x2, F2)
plot(hold)
summary(hold)

Run the code above in your browser using DataLab