Learn R Programming

rgr (version 1.0.4)

shape: An EDA Graphical Summary

Description

Plots a simple four panel graphical distributional summary for a data set, comprising a histogram, a horizontal Tukey boxplot or box-and-whisker plot, an empirical cumulative distribution function (ECDF), and a cumulative normal percentage probability (CPP) plot. The plots in all four panels will have identical x-axis scaling. Optionally the EDA graphics may be plotted with logarithmic scaling.

Usage

shape(xx, xlab = deparse(substitute(xx)), log = FALSE, xlim = NULL, 
	nclass = "scott", ifbw = FALSE, wend = 0.05, colr = 8, 
	ifnright = TRUE, ...)

Arguments

xx
name of the variable to be plotted.
xlab
a title for the x-axis. It is often desirable to replace the default x-axis title of the input variable name text string with a more informative title, e.g., xlab = "Cu (mg/kg) in <2 mm="" o-horizon="" soil"<="" code="">.
log
if it is required to display the data with logarithmic (x-axis) scaling, set log = TRUE.
xlim
is determined by gx.hist and used to ensure all four panels in this function have the same x-axis scaling. xlim may be defined, see Note below.
nclass
the default procedure for preparing the histogram is to use the Scott (1979) rule. This usually provides an informative histogram, other optional rules are nclass = "sturges" or nclass = "fd"; the later standing for Freedman-Diac
ifbw
the default is to plot a horizontal Tukey boxplot, if a box-and-whisker plot is required set ifbw = TRUE.
wend
if ifbw = TRUE the locations of the whisker-ends have to be defined. By default these are at the 5th and 95th percentiles of the data, setting wend = 0.02 plots the whisker ends at the 2nd and 98th percentiles.
colr
by default the histogram and box are infilled in grey, colr = 8. If no infill is required, set colr = 0. See display.lty for the range of available colours.
ifnright
controls where the sample size is plotted in the histogram display, by default this in the upper right corner of the plot. If the data distribution is such that the upper left corner would be preferable, set ifnright = FALSE.
...
further arguments to be passed to methods. For example, by default individual data points in the ECDF and CPP plots are marked by a plus sign, pch = 3, if a cross or open circle is desired, then set pch = 4 or pch = 1

Details

A histogram is displayed upper left, an ECDF is displayed below it (lower left). To the right of the histogram a horizontal Tukey boxplot (default) or box-and-whisker plot (option) is displayed (upper right). In the lower right quadrant a cumulative normal percentage probability (CPP) plot is displayed. In a box-and-whisker plot there are two special cases. When wend = 0 the whiskers extend to the observed minima and maxima that are not plotted with the plus symbol. When wend = 0.25 no whiskers or the data minimum and maximum are plotted, only the median and box representing the span of the middle 50 percent of the data are displayed.

References

Venables, W.N. and Ripley, B.D., 2001. Modern Applied Statistsis with S-Plus, 3rd Edition, Springer - see pp. 119 for a description of histogram bin selection computations. Garrett, R.G., 1988. IDEAS - An Interactive Computer Graphics Tool to Assist the Exploration Geochemist. In Current Research Part F, Geological Survey of Canada Paper 88-1F, pp. 1-13 for a description of box-and-whisker plots.

See Also

gx.hist, bxplot, gx.ecdf, cnpplt, remove.na, display.lty, display.marks, ltdl.fix.df, inset

Examples

Run this code
## Make test data available
data(kola.o)
attach(kola.o)

## Generates an initial display to have a first look at the data and 
## decide how best to proceed
shape(Cu)

## Provides a more appropriate initial display and indicates the 
## quartiles
shape(Cu, xlab = "Cu (mg/kg) in <2 mm O-horizon soil", log = TRUE,
	ifqs = TRUE)

## Causes the Friedman-Diaconis rule to be used to select the number of
## histogram bins and changes the ECDF and CPP plotting symbols to a
## cross/x
shape(Cu, xlab = "Cu (mg/kg) in <2 mm O-horizon soil", log = TRUE, 
	nclass = "fd", pch = 4)

## Replaces the Tukey boxplot with a box-and-whisker plot where the 
## whiskers extend to the 10th and 90th percentiles and the minimum
## and maximum observed values are marked with a plus sign.
shape(Cu, xlab = "Cu (mg/kg) in <2 mm O-horizon soil", log = TRUE, 
	ifbw =TRUE, wend = 0.1)

## Detach test data
detach(kola.o)

Run the code above in your browser using DataLab