Learn R Programming

lvplot (version 0.2.1)

LVboxplot: Side-by-side LV boxplots with base graphics

Description

An extension of standard boxplots which draws k letter statistics. Conventional boxplots (Tukey 1977) are useful displays for conveying rough information about the central 50% of the data and the extent of the data.

Usage

LVboxplot(x, ...)

# S3 method for formula LVboxplot( formula, alpha = 0.95, k = NULL, perc = NULL, horizontal = TRUE, xlab = NULL, ylab = NULL, col = "grey30", bg = "grey90", width = 0.9, width.method = "linear", median.col = "grey10", ... )

# S3 method for numeric LVboxplot( x, alpha = 0.95, k = NULL, perc = NULL, horizontal = TRUE, xlab = NULL, ylab = NULL, col = "grey30", bg = "grey90", width = 0.9, width.method = "linear", median.col = "grey10", ... )

Arguments

x

numeric vector of data

...

passed onto plot

formula

a plotting formula of the form y ~ x, where x is a string or factor. The values of y will be split into groups according to their values on x and separate letter value box plots of y are drawn side by side in the same display.

alpha

if supplied, depth k is calculated such that (1-alpha)100 intervals of an LV statistic do not extend into neighboring LV statistics.

k

number of letter value statistics used

perc

if supplied, depth k is adjusted such that perc percent outliers are shown

horizontal

display horizontally (TRUE) or vertically (FALSE)

xlab

x axis label

ylab

y axis label

col

vector of colours to use

bg

background colour

width

maximum height/width of box

width.method

one of 'linear', 'height' or 'area'. Methods 'height' and 'area' ensure that these dimension are proportional to the number of observations within each box.

median.col

colour of the line for the median

Details

For moderate-sized data sets (\(n < 1000\)), detailed estimates of tail behavior beyond the quartiles may not be trustworthy, so the information provided by boxplots is appropriately somewhat vague beyond the quartiles, and the expected number of ``outliers'' and ``far-out'' values for a Gaussian sample of size \(n\) is often less than 10 (Hoaglin, Iglewicz, and Tukey 1986). Large data sets (\(n \approx 10,000-100,000\)) afford more precise estimates of quantiles in the tails beyond the quartiles and also can be expected to present a large number of ``outliers'' (about \(0.4 + 0.007 n\)).

The letter-value box plot addresses both these shortcomings: it conveys more detailed information in the tails using letter values, only out to the depths where the letter values are reliable estimates of their corresponding quantiles (corresponding to tail areas of roughly \(2^{-i}\)); ``outliers'' are defined as a function of the most extreme letter value shown. All aspects shown on the letter-value boxplot are actual observations, thus remaining faithful to the principles that governed Tukey's original boxplot.

Examples

Run this code
n <- 10
oldpar <- par()
par(mfrow=c(4,2), mar=c(3,3,3,3))
for (i in 1:4) {
  x <- rexp(10 ^ (i + 1))
  boxplot(x, col = "grey", horizontal = TRUE)
  title(paste("Exponential, n = ", length(x)))
  LVboxplot(x, col = "grey", xlab = "")
}
par(mfrow=oldpar$mfrow, mar=oldpar$mar)

with(ontime, LVboxplot(sqrt(TaxiIn + TaxiOut) ~ UniqueCarrier, horizontal=FALSE))

Run the code above in your browser using DataLab