Last chance! 50% off unlimited learning
Sale ends in
This function is typically called by another function to gather the statistics necessary for producing box plots, but may be invoked separately.
boxplot.stats(x, coef = 1.5, do.conf = TRUE, do.out = TRUE)
this determines how far the plot ‘whiskers’ extend out
from the box. If coef
is positive, the whiskers extend to the
most extreme data point which is no more than coef
times
the length of the box away from the box. A value of zero causes
the whiskers
to extend to the data extremes (and no outliers be returned).
logicals; if FALSE
, the conf
or
out
component respectively will be empty in the result.
List with named components as follows:
a vector of length 5, containing the extreme of the lower whisker, the lower ‘hinge’, the median, the upper ‘hinge’ and the extreme of the upper whisker.
the number of non-NA
observations in the sample.
the lower and upper extremes of the ‘notch’
(if(do.conf)
). See the details.
the values of any data points which lie beyond the
extremes of the whiskers (if(do.out)
).
Note that $stats and $conf are sorted in increasing order, unlike S, and that $n and $out include any +- Inf values.
The two ‘hinges’ are versions of the first and third quartile,
i.e., close to quantile(x, c(1,3)/4)
. The hinges equal
the quartiles for odd n <- length(x)
) and
differ for even n %% 4 == 1
(n %% 4 == 2
(
The notches (if requested) extend to +/-1.58 IQR/sqrt(n)
.
This seems to be based on the same calculations as the formula with 1.57 in
Chambers et al (1983, p.62), given in McGill et al
(1978, p.16). They are based on asymptotic normality of the median
and roughly equal sample sizes for the two medians being compared, and
are said to be rather insensitive to the underlying distributions of
the samples. The idea appears to be to give roughly a 95% confidence
interval for the difference in two medians.
Tukey, J. W. (1977). Exploratory Data Analysis. Section 2C.
McGill, R., Tukey, J. W. and Larsen, W. A. (1978). Variations of box plots. The American Statistician, 32, 12--16. 10.2307/2683468.
Velleman, P. F. and Hoaglin, D. C. (1981). Applications, Basics and Computing of Exploratory Data Analysis. Duxbury Press.
Emerson, J. D and Strenio, J. (1983). Boxplots and batch comparison. Chapter 3 of Understanding Robust and Exploratory Data Analysis, eds. D. C. Hoaglin, F. Mosteller and J. W. Tukey. Wiley.
Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983). Graphical Methods for Data Analysis. Wadsworth & Brooks/Cole.
# NOT RUN {
require(stats)
x <- c(1:100, 1000)
(b1 <- boxplot.stats(x))
(b2 <- boxplot.stats(x, do.conf = FALSE, do.out = FALSE))
stopifnot(b1 $ stats == b2 $ stats) # do.out = FALSE is still robust
boxplot.stats(x, coef = 3, do.conf = FALSE)
## no outlier treatment:
boxplot.stats(x, coef = 0)
boxplot.stats(c(x, NA)) # slight change : n is 101
(r <- boxplot.stats(c(x, -1:1/0)))
stopifnot(r$out == c(1000, -Inf, Inf))
# }
# NOT RUN {
<!-- %% extended example (for the NG of Rdoc): -->
# }
# NOT RUN {
# }
Run the code above in your browser using DataLab