bv.boxplot: Bivariate boxplots

Description

Creates diagnostic bivariate quelplot ellipses (bivariate boxplots) using the method of Goldberg and Iglewicz (1992). The output can be used to check assumptions of bivariate normality and to identify multivariate outliers. The default robust=TRUE option relies on on a biweight correlation estimator function written by Everitt (2006). Quelplots, are potentially asymmetric, although the method currently employed here uses a single "fence" definition and creates symmetric ellipses.

Usage

bv.boxplot(X, Y, robust = TRUE, D = 7, xlab = "X", ylab="Y", pch = 21, 
pch.out = NULL, bg = "gray", bg.out = NULL, hinge.col = 1, fence.col = 1, 
hinge.lty = 2, fence.lty = 3, xlim = NULL, ylim = NULL, names = 1:length(X), 
ID.out = FALSE, cex.ID.out = 0.7, uni.CI = FALSE, uni.conf = 0.95, 
uni.CI.col = 1, uni.CI.lty = 1, uni.CI.lwd = 2, show.points = TRUE, ...)

Value

A diagnostic plot is returned. Invisible objects from the function include location, scale and correlation estimates for $X$ and $Y$, estimates for $E_m$ and $E_{max}$, and a list of outliers (that exceed $E_{max}$).

Arguments

X: First of two quantitative variables making up the bivariate distribution.
Y: Second of two quantitative variables making up the bivariate distribution.
robust: Logical. Robust estimators, i.e. robust = TRUE are recommended.
D: The default D = 7 lets the fence be equal to a 99 percent confidence interval for an individual observation.
xlab: Caption for X axis.
ylab: Caption for Y axis.
pch: Plotting character(s) for scatterplot.
pch.out: Plotting character for outliers.
hinge.col: Hinge color.
fence.col: Fence color.
hinge.lty: Hinge line type.
fence.lty: Fence line type.
xlim: A two element vector defining the X-limits of the plot.
ylim: The Y-limits of the plot.
bg: Background color for points in scatterplot, defaults to black if pch is not in the range 21:26.
bg.out: Background color for outlying points in scatterplot, defaults to black if pch is not in the range 21:26.
names: An optional vector of names for X, Y coordinates.
ID.out: Logical. Whether or not outlying points should be given labels (from argument name in plot.
cex.ID.out: Character expansion for outlying ID labels.
uni.CI: Logical. If true, univariate confidence intervals for the true median at confidence uni.CI are shown.
uni.conf: Univariate confidence, only used if CI.uni = TRUE.
uni.CI.col: Univariate confidence bound line color, only used if CI.uni = TRUE.
uni.CI.lty: Univariate confidence bound line type, only used if CI.uni = TRUE.
uni.CI.lwd: Univariate confidence bound line width, only used if CI.uni = TRUE.
show.points: Logical. Whether points should be shown in graph.
...: Additional arguments from points.

Author

Ken Aho, the function relies on an Everitt (2006) function for robust M-estimation.

Details

Two ellipses are drawn. The inner is the "hinge" which contains 50 percent of the data. The outer is the "fence". Observations outside of the "fence" constitute possible troublesome outliers. The function bivariate from Everitt (2004) is used to calculate robust biweight measures of correlation, scale, and location if robust = TRUE (the default). We have the following form to the quelplot model:

$$E_i = \sqrt{\frac{X^2_{si} + Y^2_{si} - 2R^*X_{si}Y_{si}}{1-R^{*2}}}.$$

where $X_{si} = (X_i - T^*_X)/S^*_X$, and $Y_{si} = (Y_i - T^*_X)/S^*_Y$ are standardized values for $X_i$ and $Y_i$, respectively, $T^*_X$ and $T^*_Y$ are location estimators for X and Y, $S^*_X$ and $S^*_Y$ are scale estimators for X and Y, and $R^*$ is a correlation estimator for X and Y. We have:

$$E_m = median\{E_i:i=1,2,...,n\},$$ and $$E_{max} = max\{E_i: E_i^2 < DE^2_m\}.$$ where $D$ is a constant that regulates the distance of the "fence" and "hinge".

To draw the "hinge" we have:

$$R_1 = E_m\sqrt{\frac{1 + R^*}{2}},$$ $$R_2 = E_m\sqrt{\frac{1 - R^*}{2}}.$$

To draw the "fence" we have:

$$R_1 = E_{max}\sqrt{\frac{1 + R^*}{2}},$$ $$R_2 = E_{max}\sqrt{\frac{1 - R^*}{2}}.$$

For $\theta$ = 0 to 360, let:

$$\Theta_1 = R_1cos(\theta),$$ $$\Theta_2 = R_2sin(\theta).$$

The Cartesian coordinates of the "hinge" and "fence" are:

$$X=T^*_X=(\Theta_1+\Theta_2)S^*_X,$$ $$Y=T^*_Y=(\Theta_1-\Theta_2)S^*_Y.$$

Quelplots, are potentially asymmetric, although the current (and only) method used here defines a single value for $E_{max}$ and hence creates symmetric ellipses. Under this implementation at least one point will define $E_{max}$, and lie on the "fence".

References

Everitt, B. (2006) An R and S-plus Companion to Multivariate Analysis. Springer.

Goldberg, K. M., and B. Ingelwicz (1992) Bivariate extensions of the boxplot. Technometrics 34: 307-320.

Examples

Run this code

Y1<-rnorm(100, 17, 3)
Y2<-rnorm(100, 13, 2)
bv.boxplot(Y1, Y2)

X <- c(-0.24, 2.53, -0.3, -0.26, 0.021, 0.81, -0.85, -0.95, 1.0, 0.89, 0.59, 
0.61, -1.79, 0.60, -0.05, 0.39, -0.94, -0.89, -0.37, 0.18)
Y <- c(-0.83, -1.44, 0.33, -0.41, -1.0, 0.53, -0.72, 0.33,  0.27, -0.99, 0.15, 
-1.17, -0.61, 0.37, -0.96, 0.21, -1.29, 1.40, -0.21, 0.39)
b <- bv.boxplot(X, Y, ID.out = TRUE, bg.out = "red")
b

Run the code above in your browser using DataLab