xYplot: xyplot and dotplot with Matrix Variables to Plot Error Bars and Bands

Description

A utility function Cbind returns the first argument as a vector and combines all other arguments into a matrix stored as an attribute called "other". The arguments can be named (e.g., Cbind(pressure=y,ylow,yhigh)) or a label attribute may be pre-attached to the first argument. In either case, the name or label of the first argument is stored as an attribute "label" of the object returned by Cbind. Storing other vectors as a matrix attribute facilitates plotting error bars, etc., as trellis really wants the x- and y-variables to be vectors, not matrices. If a single argument is given to Cbind and that argument is a matrix with column dimnames, the first column is taken as the main vector and remaining columns are taken as "other". A subscript method for Cbind objects subscripts the other matrix along with the main y vector.

The xYplot function is a substitute for xyplot that allows for simulated multi-column y. It uses by default the panel.xYplot and prepanel.xYplot functions to do the actual work. The method argument passed to panel.xYplot from xYplot allows you to make error bars, the upper-only or lower-only portions of error bars, alternating lower-only and upper-only bars, bands, or filled bands. panel.xYplot decides how to alternate upper and lower bars according to whether the median y value of the current main data line is above the median y for all groups of lines or not. If the median is above the overall median, only the upper bar is drawn. For bands (but not 'filled bands'), any number of other columns of y will be drawn as lines having the same thickness, color, and type as the main data line. If plotting bars, bands, or filled bands and only one additional column is specified for the response variable, that column is taken as the half width of a precision interval for y, and the lower and upper values are computed automatically as y plus or minus the value of the additional column variable.

When a groups variable is present, panel.xYplot will create a function in frame 0 (.GlobalEnv in R) called Key that when invoked will draw a key describing the groups labels, point symbols, and colors. By default, the key is outside the graph. For S-Plus, if Key(locator(1)) is specified, the key will appear so that its upper left corner is at the coordinates of the mouse click. For R/Lattice the first two arguments of Key (x and y) are fractions of the page, measured from the lower left corner, and the default placement is at x=0.05, y=0.95. For R, an optional argument to sKey, other, may contain a list of arguments to pass to draw.key (see xyplot for a list of possible arguments, under the key option).

When method="quantile" is specified, xYplot automatically groups the x variable into intervals containing a target of nx observations each, and within each x group computes three quantiles of y and plots these as three lines. The mean x within each x group is taken as the x-coordinate. This will make a useful empirical display for large datasets in which scatterdiagrams are too busy to see patterns of central tendency and variability. You can also specify a general function of a data vector that returns a matrix of statistics for the method argument. Arguments can be passed to that function via a list methodArgs. The statistic in the first column should be the measure of central tendency. Examples of useful method functions are those listed under the help file for summary.formula such as smean.cl.normal.

xYplot can also produce bubble plots. This is done when size is specified to xYplot. When size is used, a function sKey is generated for drawing a key to the character sizes. See the bubble plot example. size can also specify a vector where the first character of each observation is used as the plotting symbol, if rangeCex is set to a single cex value. An optional argument to sKey, other, may contain a list of arguments to pass to draw.key (see xyplot for a list of possible arguments, under the key option). See the bubble plot example.

Dotplot is a substitute for dotplot allowing for a matrix x-variable, automatic superpositioning when groups is present, and creation of a Key function. When the x-variable (created by Cbind to simulate a matrix) contains a total of 3 columns, the first column specifies where the dot is positioned, and the last 2 columns specify starting and ending points for intervals. The intervals are shown using line type, width, and color from the trellis plot.line list. By default, you will usually see a darker line segment for the low and high values, with the dotted reference line elsewhere. A good choice of the pch argument for such plots is 3 (plus sign) if you want to emphasize the interval more than the point estimate. When the x-variable contains a total of 5 columns, the 2nd and 5th columns are treated as the 2nd and 3rd are treated above, and the 3rd and 4th columns define an inner line segment that will have twice the thickness of the outer segments. In addition, tick marks separate the outer and inner segments. This type of display (an example of which appeared in The Elements of Graphing Data by Cleveland) is very suitable for displaying two confidence levels (e.g., 0.9 and 0.99) or the 0.05, 0.25, 0.75, 0.95 sample quantiles, for example. For this display, the central point displays well with a default circle symbol.

setTrellis sets nice defaults for Trellis graphics, assuming that the graphics device has already been opened if using postscript, etc. By default, it sets panel strips to blank and reference dot lines to thickness 1 instead of the Trellis default of 2.

numericScale is a utility function that facilitates using xYplot to plot variables that are not considered to be numeric but which can readily be converted to numeric using as.numeric(). numericScale by default will keep the name of the input variable as a label attribute for the new numeric variable.

Usage

Cbind(...)
xYplot(formula, data = sys.frame(sys.parent()), groups,
       subset, xlab=NULL, ylab=NULL, ylim=NULL,
       panel=panel.xYplot, prepanel=prepanel.xYplot, scales=NULL,
       minor.ticks=NULL, sub=NULL, ...)
panel.xYplot(x, y, subscripts, groups=NULL, 
             type=if(is.function(method) || method=='quantiles') 
               'b' else 'p',
             method=c("bars", "bands", "upper bars", "lower bars", 
                      "alt bars", "quantiles", "filled bands"), 
             methodArgs=NULL, label.curves=TRUE, abline,
             probs=c(.5,.25,.75), nx=NULL,
             cap=0.015, lty.bar=1, 
             lwd=plot.line$lwd, lty=plot.line$lty, pch=plot.symbol$pch, 
             cex=plot.symbol$cex, font=plot.symbol$font, col=NULL, 
             lwd.bands=NULL, lty.bands=NULL, col.bands=NULL, 
             minor.ticks=NULL, col.fill=NULL,
             size=NULL, rangeCex=c(.5,3), ...)
prepanel.xYplot(x, y, ...)
Dotplot(formula, data = sys.frame(sys.parent()), groups, subset, 
        xlab = NULL, ylab = NULL, ylim = NULL,
        panel=panel.Dotplot, prepanel=prepanel.Dotplot,
        scales=NULL, xscale=NULL, ...)
prepanel.Dotplot(x, y, ...)
panel.Dotplot(x, y, groups = NULL,
              pch  = dot.symbol$pch, 
              col  = dot.symbol$col, cex = dot.symbol$cex, 
              font = dot.symbol$font, abline, ...)
setTrellis(strip.blank=TRUE, lty.dot.line=2, lwd.dot.line=1)
numericScale(x, label=NULL, ...)

Value

Cbind returns a matrix with attributes. Other functions return standard trellis results.

Arguments

...

for Cbind ... is any number of additional numeric vectors. Unless you are using Dotplot (which allows for either 2 or 4 "other" variables) or xYplot with method="bands", vectors after the first two are ignored. If drawing bars and only one extra variable is given in ..., upper and lower values are computed as described above. If the second argument to Cbind is a matrix, that matrix is stored in the "other" attribute and arguments after the second are ignored. For bubble plots, name an argument cex.

Also can be other arguments to pass to labcurve.

formula

a trellis formula consistent with xyplot or dotplot

x

x-axis variable. For numericScale x is any vector such as as.numeric(x) returns a numeric vector suitable for x- or y-coordinates.

y

a vector, or an object created by Cbind for xYplot. y represents the main variable to plot, i.e., the variable used to draw the main lines. For Dotplot the first argument to Cbind will be the main x-axis variable.

data,subset,ylim,subscripts,groups,type,scales,panel,prepanel,xlab,ylab

see trellis.args. xlab and ylab get default values from "label" attributes.

xscale

allows one to use the default scales but specify only the x component of it for Dotplot

method

defaults to "bars" to draw error-bar type plots. See meaning of other values above. method can be a function. Specifying method=quantile, methodArgs=list(probs=c(.5,.25,.75)) is the same as specifying method="quantile" without specifying probs.

methodArgs

a list containing optional arguments to be passed to the function specified in method

label.curves

set to FALSE to suppress invocation of labcurve to label primary curves where they are most separated or to draw a legend in an empty spot on the panel. You can also set label.curves to a list of options to pass to labcurve. These options can also be passed as ... to xYplot. See the examples below.

abline

a list of arguments to pass to panel.abline for each panel, e.g. list(a=0, b=1, col=3) to draw the line of identity using color 3. To make multiple calls to panel.abline, pass a list of unnamed lists as abline, e.g., abline=list(list(h=0),list(v=1)).

probs

a vector of three quantiles with the quantile corresponding to the central line listed first. By default probs=c(.5, .25, .75). You can also specify probs through methodArgs=list(probs=...).

nx

number of target observations for each x group (see cut2 m argument). nx defaults to the minimum of 40 and the number of points in the current stratum divided by 4. Set nx=FALSE or nx=0 if x is already discrete and requires no grouping.

cap

the half-width of horizontal end pieces for error bars, as a fraction of the length of the x-axis

lty.bar

line type for bars

lwd, lty, pch, cex, font, col

see trellis.args. These are vectors when groups is present, and the order of their elements corresponds to the different groups, regardless of how many bands or bars are drawn. If you don't specify lty.bands, for example, all band lines within each group will have the same lty.

lty.bands, lwd.bands, col.bands

used to allow lty, lwd, col to vary across the different band lines for different groups. These parameters are vectors or lists whose elements correspond to the added band lines (i.e., they ignore the central line, whose line characteristics are defined by lty, lwd, col). For example, suppose that 4 lines are drawn in addition to the central line. Specifying lwd.bands=1:4 will cause line widths of 1:4 to be used for every group, regardless of the value of lwd. To vary characteristics over the groups use e.g. lwd.bands=list(rep(1,4), rep(2,4)) or list(c(1,2,1,2), c(3,4,3,4)).

minor.ticks

a list with elements at and labels specifying positions and labels for minor tick marks to be used on the x-axis of each panel, if any.

sub

an optional subtitle

col.fill

used to override default colors used for the bands in method='filled bands'. This is a vector when groups is present, and the order of the elements corresponds to the different groups, regardless of how many bands are drawn. The default colors for 'filled bands' are pastel colors matching the default colors superpose.line$col (plot.line$col)

size

a vector the same length as x giving a variable whose values are a linear function of the size of the symbol drawn. This is used for example for bubble plots.

rangeCex

a vector of two values specifying the range in character sizes to use for the size variable (lowest first, highest second). size values are linearly translated to this range, based on the observed range of size when x and y coordinates are not missing. Specify a single numeric cex value for rangeCex to use the first character of each observations's size as the plotting symbol.

strip.blank

set to FALSE to not make the panel strip backgrounds blank

lty.dot.line

line type for dot plot reference lines (default = 1 for dotted; use 2 for dotted)

lwd.dot.line

line thickness for reference lines for dot plots (default = 1)

label

a scalar character string to be used as a variable label after numericScale converts the variable to numeric form

Side Effects

plots, and panel.xYplot may create temporary Key and sKey functions in the session frame.

Author

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
Madeline Bauer
Department of Infectious Diseases
University of Southern California School of Medicine
mbauer@usc.edu

Details

Unlike xyplot, xYplot senses the presence of a groups variable and automatically invokes panel.superpose instead of panel.xyplot. The same is true for Dotplot vs. dotplot.

Examples

Run this code

# Plot 6 smooth functions.  Superpose 3, panel 2.
# Label curves with p=1,2,3 where most separated 
d <- expand.grid(x=seq(0,2*pi,length=150), p=1:3, shift=c(0,pi)) 
xYplot(sin(x+shift)^p ~ x | shift, groups=p, data=d, type='l') 
# Use a key instead, use 3 line widths instead of 3 colors 
# Put key in most empty portion of each panel
xYplot(sin(x+shift)^p ~ x | shift, groups=p, data=d, 
       type='l', keys='lines', lwd=1:3, col=1) 
# Instead of implicitly using labcurve(), put a 
# single key outside of panels at lower left corner
xYplot(sin(x+shift)^p ~ x | shift, groups=p, data=d, 
       type='l', label.curves=FALSE, lwd=1:3, col=1, lty=1:3) 
Key()

# Bubble plots
x <- y <- 1:8
x[2] <- NA
units(x) <- 'cm^2'
z <- 101:108
p <- factor(rep(c('a','b'),4))
g <- c(rep(1,7),2)
data.frame(p, x, y, z, g)
xYplot(y ~ x | p, groups=g, size=z)
 Key(other=list(title='g', cex.title=1.2))  # draw key for colors
sKey(.2,.85,other=list(title='Z Values', cex.title=1.2))
# draw key for character sizes

# Show the median and quartiles of height given age, stratified 
# by sex and race.  Draws 2 sets (male, female) of 3 lines per panel.
# xYplot(height ~ age | race, groups=sex, method='quantiles')


# Examples of plotting raw data
dfr <- expand.grid(month=1:12, continent=c('Europe','USA'), 
                   sex=c('female','male'))
set.seed(1)
dfr <- upData(dfr,
              y=month/10 + 1*(sex=='female') + 2*(continent=='Europe') + 
                runif(48,-.15,.15),
              lower=y - runif(48,.05,.15),
              upper=y + runif(48,.05,.15))


xYplot(Cbind(y,lower,upper) ~ month,subset=sex=='male' & continent=='USA',
       data=dfr)
xYplot(Cbind(y,lower,upper) ~ month|continent, subset=sex=='male',data=dfr)
xYplot(Cbind(y,lower,upper) ~ month|continent, groups=sex, data=dfr); Key() 
# add ,label.curves=FALSE to suppress use of labcurve to label curves where
# farthest apart


xYplot(Cbind(y,lower,upper) ~ month,groups=sex,
                              subset=continent=='Europe', data=dfr) 
xYplot(Cbind(y,lower,upper) ~ month,groups=sex, type='b',
                              subset=continent=='Europe', keys='lines',
                              data=dfr)
# keys='lines' causes labcurve to draw a legend where the panel is most empty


xYplot(Cbind(y,lower,upper) ~ month,groups=sex, type='b', data=dfr,
                              subset=continent=='Europe',method='bands') 
xYplot(Cbind(y,lower,upper) ~ month,groups=sex, type='b', data=dfr,
                              subset=continent=='Europe',method='upper')


label(dfr$y) <- 'Quality of Life Score'   
# label is in Hmisc library = attr(y,'label') <- 'Quality\dots'; will be
# y-axis label 
# can also specify Cbind('Quality of Life Score'=y,lower,upper) 
xYplot(Cbind(y,lower,upper) ~ month, groups=sex,
       subset=continent=='Europe', method='alt bars',
        offset=grid::unit(.1,'inches'), type='b', data=dfr)   
# offset passed to labcurve to label .4 y units away from curve
# for R (using grid/lattice), offset is specified using the grid
# unit function, e.g., offset=grid::unit(.4,'native') or
# offset=grid::unit(.1,'inches') or grid::unit(.05,'npc')


# The following example uses the summarize function in Hmisc to 
# compute the median and outer quartiles.  The outer quartiles are 
# displayed using "error bars"
set.seed(111)
dfr <- expand.grid(month=1:12, year=c(1997,1998), reps=1:100)
month <- dfr$month; year <- dfr$year
y <- abs(month-6.5) + 2*runif(length(month)) + year-1997
s <- summarize(y, llist(month,year), smedian.hilow, conf.int=.5) 
xYplot(Cbind(y,Lower,Upper) ~ month, groups=year, data=s, 
       keys='lines', method='alt', type='b')
# Can also do:
s <- summarize(y, llist(month,year), quantile, probs=c(.5,.25,.75),
               stat.name=c('y','Q1','Q3')) 
xYplot(Cbind(y, Q1, Q3) ~ month, groups=year, data=s, 
       type='b', keys='lines') 
# Or:
xYplot(y ~ month, groups=year, keys='lines', nx=FALSE, method='quantile',
       type='b') 
# nx=FALSE means to treat month as a discrete variable


# To display means and bootstrapped nonparametric confidence intervals 
# use:
s <- summarize(y, llist(month,year), smean.cl.boot) 
s
xYplot(Cbind(y, Lower, Upper) ~ month | year, data=s, type='b')
# Can also use Y <- cbind(y, Lower, Upper); xYplot(Cbind(Y) ~ ...) 
# Or:
xYplot(y ~ month | year, nx=FALSE, method=smean.cl.boot, type='b')


# This example uses the summarize function in Hmisc to 
# compute the median and outer quartiles.  The outer quartiles are 
# displayed using "filled bands"


s <- summarize(y, llist(month,year), smedian.hilow, conf.int=.5) 


# filled bands: default fill = pastel colors matching solid colors
# in superpose.line (this works differently in R)
xYplot ( Cbind ( y, Lower, Upper ) ~ month, groups=year, 
     method="filled bands" , data=s, type="l")


# note colors based on levels of selected subgroups, not first two colors
xYplot ( Cbind ( y, Lower, Upper ) ~ month, groups=year, 
     method="filled bands" , data=s, type="l",
     subset=(year == 1998 | year == 2000), label.curves=FALSE )


# filled bands using black lines with selected solid colors for fill
xYplot ( Cbind ( y, Lower, Upper ) ~ month, groups=year, 
     method="filled bands" , data=s, label.curves=FALSE,
     type="l", col=1, col.fill = 2:3)
Key(.5,.8,col = 2:3) #use fill colors in key


# A good way to check for stable variance of residuals from ols 
# xYplot(resid(fit) ~ fitted(fit), method=smean.sdl) 
# smean.sdl is defined with summary.formula in Hmisc


# Plot y vs. a special variable x
# xYplot(y ~ numericScale(x, label='Label for X') | country) 
# For this example could omit label= and specify 
#    y ~ numericScale(x) | country, xlab='Label for X'


# Here is an example of using xYplot with several options
# to change various Trellis parameters,
# xYplot(y ~ x | z, groups=v, pch=c('1','2','3'),
#        layout=c(3,1),     # 3 panels side by side
#        ylab='Y Label', xlab='X Label',
#        main=list('Main Title', cex=1.5),
#        par.strip.text=list(cex=1.2),
#        strip=function(\dots) strip.default(\dots, style=1),
#        scales=list(alternating=FALSE))


#
# Dotplot examples
#


s <- summarize(y, llist(month,year), smedian.hilow, conf.int=.5) 


setTrellis()            # blank conditioning panel backgrounds 
Dotplot(month ~ Cbind(y, Lower, Upper) | year, data=s) 
# or Cbind(\dots), groups=year, data=s


# Display a 5-number (5-quantile) summary (2 intervals, dot=median) 
# Note that summarize produces a matrix for y, and Cbind(y) trusts the 
# first column to be the point estimate (here the median) 
s <- summarize(y, llist(month,year), quantile,
               probs=c(.5,.05,.25,.75,.95), type='matrix') 
Dotplot(month ~ Cbind(y) | year, data=s) 
# Use factor(year) to make actual years appear in conditioning title strips

# Plot proportions and their Wilson confidence limits
set.seed(3)
d <- expand.grid(continent=c('USA','Europe'), year=1999:2001,
                 reps=1:100)
# Generate binary events from a population probability of 0.2
# of the event, same for all years and continents
d$y <- ifelse(runif(6*100) <= .2, 1, 0)
s <- with(d,
          summarize(y, llist(continent,year),
                    function(y) {
                     n <- sum(!is.na(y))
                     s <- sum(y, na.rm=TRUE)
                     binconf(s, n)
                    }, type='matrix')
)

Dotplot(year ~ Cbind(y) | continent,  data=s, ylab='Year',
        xlab='Probability')


# Dotplot(z ~ x | g1*g2)                 
# 2-way conditioning 
# Dotplot(z ~ x | g1, groups=g2); Key()  
# Key defines symbols for g2


# If the data are organized so that the mean, lower, and upper 
# confidence limits are in separate records, the Hmisc reShape 
# function is useful for assembling these 3 values as 3 variables 
# a single observation, e.g., assuming type has values such as 
# c('Mean','Lower','Upper'):
# a <- reShape(y, id=month, colvar=type) 
# This will make a matrix with 3 columns named Mean Lower Upper 
# and with 1/3 as many rows as the original data

Run the code above in your browser using DataLab