overlap: Overlapping estimation

Description

It gives the overlapped estimated area of two or more kernel density estimations from empirical data.

Usage

overlap( x, nbins = 1024, plot = FALSE, 
    partial.plot = FALSE, boundaries = NULL, 
    return.complete.data = FALSE, ... )

Value

It returns a list containing the following components:

DD: Data frame with information used for computing overlapping, containing the following variables (only if return.complete.data = TRUE): x, coordinates of the points where the density is estimated; y1 and y2, densities; ovy, density for estimating overlapping area (i.e. min(y1,y2)); ally, density for estimating whole area (i.e. max(y1,y2)); dominance, indicates which distribution has the highest density; k, label indicating which distributions are compared.
OV: Estimates of overlapped areas relative to each pair of distributions.
xpoints: List of abscissas of intersection points among the density curves.

Arguments

x: list of numerical vectors to be compared; each vector is an element of the list
nbins: number of equally spaced points at which the overlapping density is evaluated; see density for details
plot: logical, if TRUE, final plot of estimated densities and overlapped areas is produced
partial.plot: logical, if TRUE, partial paired distributions are plotted
boundaries: an optional list for bounded distributions, see Details
return.complete.data: logical, if TRUE, return a data frame with information used for computing overlapping (see Value).
...: optional arguments to be passed to function density

Author

Massimiliano Pastore

Details

If the list x contains more than two elements (i.e. more than two distributions) it computes overlapping between all paired distributions. Partial plots refer to these paired distributions.

If plot=TRUE, all overlapped areas are plotted. It requires ggplot2.

The optional list boundaries must contain two elements: from and to, indicating the empirical limits of input variables. Each element must be of length equal to the input data list x or, at least, length one when all boundaries are equal for all distributions. See examples below.

References

Pastore, M. (2018). Overlapping: a R package for Estimating Overlapping in Empirical Distributions. The Journal of Open Source Software, 3 (32), 1023. tools:::Rd_expr_doi("https://doi.org/10.21105/joss.01023")

Pastore, M., Calcagnì, A. (2019). Measuring Distribution Similarities Between Samples: A Distribution-Free Overlapping Index. Frontiers in Psychology, 10:1089. tools:::Rd_expr_doi("https://doi.org/10.3389/fpsyg.2019.01089")

Examples

Run this code

set.seed(20150605)
x <- list(X1=rnorm(100), X2=rt(50,8), X3=rchisq(80,2))
out <- overlap(x, plot=TRUE)
out$OV

# including boundaries
x <- list(X1=runif(100), X2=runif(100,.5,1))
boundaries <- list( from = c(0,.5), to = c(1,1) )
out <- overlap(x, plot=TRUE, boundaries=boundaries)
out$OV

# equal boundaries
x <- list(X1=runif(100), X2=runif(50), X3=runif(30))
boundaries <- list( from = 0, to = 1 )
out <- overlap(x, plot=TRUE, boundaries=boundaries)
out$OV

# changing kernel
out <- overlap(x, plot=TRUE, kernel="rectangular")
out$OV

Run the code above in your browser using DataLab