Produces boxplots adjusted for skewed distributions as proposed in Hubert and Vandervieren (2004).
adjbox(x, …)# S3 method for formula
adjbox(formula, data = NULL, …, subset, na.action = NULL)
# S3 method for default
adjbox(x, …, range = 1.5, doReflect = FALSE,
width = NULL, varwidth = FALSE,
notch = FALSE, outline = TRUE, names, plot = TRUE,
border = par("fg"), col = NULL, log = "",
pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5),
horizontal = FALSE, add = FALSE, at = NULL)
a formula, such as y ~ grp
, where y
is a
numeric vector of data values to be split into groups according to
the grouping variable grp
(usually a factor).
a data.frame (or list) from which the variables in
formula
should be taken.
an optional vector specifying a subset of observations to be used for plotting.
a function which indicates what should happen
when the data contain NA
s. The default is to ignore missing
values in either the response or the group.
for specifying data from which the boxplots are to be
produced. Either a numeric vector, or a single list containing such
vectors. Additional unnamed arguments specify further data
as separate vectors (each corresponding to a component boxplot).
NA
s are allowed in the data.
For the formula
method, named arguments to be passed to
the default method.
For the default method, unnamed arguments are additional data
vectors (unless x
is a list when they are ignored),
and named arguments are arguments and graphical parameters to be
passed to bxp
in addition to the ones
given by argument pars
(and override those in pars
).
this determines how far the plot whiskers extend out
from the box, and is simply passed as argument coef
to
adjboxStats()
. If range
is positive, the
whiskers extend to the most extreme data point which is no more than
range
times the interquartile range from the box. A value
of zero causes the whiskers to extend to the data extremes.
logical indicating if the MC should also be
computed on the reflected sample -x
, and be averaged,
see mc
.
a vector giving the relative widths of the boxes making up the plot.
if varwidth
is TRUE
, the boxes are
drawn with widths proportional to the square-roots of the number
of observations in the groups.
if notch
is TRUE
, a notch is drawn in
each side of the boxes. If the notches of two plots do not
overlap this is ‘strong evidence’ that the two medians differ
(Chambers et al., 1983, p. 62). See boxplot.stats
for the calculations used.
if outline
is not true, the outliers are
not drawn (as points whereas S+ uses lines).
group labels which will be printed under each boxplot.
a scale factor to be applied to all boxes. When there are only a few groups, the appearance of the plot can be improved by making the boxes narrower.
staple line width expansion, proportional to box width.
outlier line width expansion, proportional to box width.
if TRUE
(the default) then a boxplot is
produced. If not, the summaries which the boxplots are based on
are returned.
an optional vector of colors for the outlines of the
boxplots. The values in border
are recycled if the
length of border
is less than the number of plots.
if col
is non-null it is assumed to contain colors
to be used to colour the bodies of the box plots. By default they
are in the background colour.
character indicating if x or y or both coordinates should be plotted in log scale.
a list of (potentially many) more graphical parameters,
e.g., boxwex
or outpch
; these are passed to
bxp
(if plot
is true); for details, see there.
logical indicating if the boxplots should be
horizontal; default FALSE
means vertical boxes.
logical, if true add boxplot to current plot.
numeric vector giving the locations where the boxplots should
be drawn, particularly when add = TRUE
;
defaults to 1:n
where n
is the number of boxes.
A list
with the following components:
a matrix, each column contains the extreme of the lower whisker, the lower hinge, the median, the upper hinge and the extreme of the upper whisker for one group/plot. If all the inputs have the same class attribute, so will this component.
a vector with the number of observations in each group.
a matrix where each column contains the lower and upper extremes of the notch.
the values of any data points which lie beyond the extremes of the whiskers.
a vector of the same length as out whose elements indicate to which group the outlier belongs.
a vector of names for the groups.
The generic function adjbox
currently has a default method
(adjbox.default
) and a formula interface (adjbox.formula
).
If multiple groups are supplied either as multiple arguments or via a
formula, parallel boxplots will be plotted, in the order of the
arguments or the order of the levels of the factor (see
factor
).
Missing values are ignored when forming boxplots.
Extremes of the upper and whiskers of the adjusted boxplots are
computed using the medcouple (mc()
), a robust measure of
skewness. For details, cf. TODO
Hubert, M. and Vandervieren, E. (2008). An adjusted boxplot for skewed distributions, Computational Statistics and Data Analysis 52, 5186--5201.
# NOT RUN {
if(require("boot")) {
### Hubert and Vandervieren (2006), p. 10, Fig. 4.
data(coal, package = "boot")
coaldiff <- diff(coal$date)
op <- par(mfrow = c(1,2))
boxplot(coaldiff, main = "Original Boxplot")
adjbox(coaldiff, main = "Adjusted Boxplot")
par(op)
}
### Hubert and Vandervieren (2006), p. 11, Fig. 6. -- enhanced
op <- par(mfrow = c(2,2), mar = c(1,3,3,1), oma = c(0,0,3,0))
with(condroz, {
boxplot(Ca, main = "Original Boxplot")
adjbox (Ca, main = "Adjusted Boxplot")
boxplot(Ca, main = "Original Boxplot [log]", log = "y")
adjbox (Ca, main = "Adjusted Boxplot [log]", log = "y")
})
mtext("'Ca' from data(condroz)",
outer=TRUE, font = par("font.main"), cex = 2)
par(op)
# }
Run the code above in your browser using DataLab