Hmisc (version 5.0-1)

labcurve: Label Curves, Make Keys, and Interactively Draw Points and Curves

Description

labcurve optionally draws a set of curves then labels the curves. A variety of methods for drawing labels are implemented, ranging from positioning using the mouse to automatic labeling to automatic placement of key symbols with manual placement of key legends to automatic placement of legends. For automatic positioning of labels or keys, a curve is labeled at a point that is maximally separated from all of the other curves. Gaps occurring when curves do not start or end at the same x-coordinates are given preference for positioning labels. If labels are offset from the curves (the default behaviour), if the closest curve to curve i is above curve i, curve i is labeled below its line. If the closest curve is below curve i, curve i is labeled above its line. These directions are reversed if the resulting labels would appear outside the plot region.

Both ordinary lines and step functions are handled, and there is an option to draw the labels at the same angle as the curve within a local window.

Unless the mouse is used to position labels or plotting symbols are placed along the curves to distinguish them, curves are examined at 100 (by default) equally spaced points over the range of x-coordinates in the current plot area. Linear interpolation is used to get y-coordinates to line up (step function or constant interpolation is used for step functions). There is an option to instead examine all curves at the set of unique x-coordinates found by unioning the x-coordinates of all the curves. This option is especially useful when plotting step functions. By setting adj="auto" you can have labcurve try to optimally left- or right-justify labels depending on the slope of the curves at the points at which labels would be centered (plus a vertical offset). This is especially useful when labels must be placed on steep curve sections.

You can use the on top method to write (short) curve names directly on the curves (centered on the y-coordinate). This is especially useful when there are many curves whose full labels would run into each other. You can plot letters or numbers on the curves, for example (using the keys option), and have labcurve use the key function to provide long labels for these short ones (see the end of the example). There is another option for connecting labels to curves using arrows. When keys is a vector of integers, it is taken to represent plotting symbols (pchs), and these symbols are plotted at equally-spaced x-coordinates on each curve (by default, using 5 points per curve). The points are offset in the x-direction between curves so as to minimize the chance of collisions.

To add a legend defining line types, colors, or line widths with no symbols, specify keys="lines", e.g., labcurve(curves, keys="lines", lty=1:2).

putKey provides a different way to use key() by allowing the user to specify vectors for labels, line types, plotting characters, etc. Elements that do not apply (e.g., pch for lines (type="l")) may be NA. When a series of points is represented by both a symbol and a line, the corresponding elements of both pch and lty, col., or lwd will be non-missing.

putKeyEmpty, given vectors of all the x-y coordinates that have been plotted, uses largest.empty to find the largest empty rectangle large enough to hold the key, and draws the key using putKey.

drawPlot is a simple mouse-driven function for drawing series of lines, step functions, polynomials, Bezier curves, and points, and automatically labeling the point groups using labcurve or putKeyEmpty. When drawPlot is invoked it creates temporary functions Points, Curve, and Abline. The user calls these functions inside the call to drawPlot to define groups of points in the order they are defined with the mouse. Abline is used to call abline and not actually great a group of points. For some curve types, the curve generated to represent the corresponding series of points is drawn after all points are entered for that series, and this curve may be different than the simple curve obtained by connecting points at the mouse clicks. For example, to draw a general smooth Bezier curve the user need only click on a few points, and she must overshoot the final curve coordinates to define the curve. The originally entered points are not erased once the curve is drawn. The same goes for step functions and polynomials. If you plot() the object returned by drawPlot, however, only final curves will be shown. The last examples show how to use drawPlot.

The largest.empty function finds the largest rectangle that is large enough to hold a rectangle of a given height and width, such that the rectangle does not contain any of a given set of points. This is used by labcurve and putKeyEmpty to position keys at the most empty part of an existing plot. The default method was created by Hans Borchers.

Usage

labcurve(curves, labels=names(curves),
         method=NULL, keys=NULL, keyloc=c("auto","none"),
         type="l", step.type=c("left", "right"), 
         xmethod=if(any(type=="s")) "unique" else "grid", 
         offset=NULL, xlim=NULL,
         tilt=FALSE, window=NULL, npts=100, cex=NULL, 
         adj="auto", angle.adj.auto=30,
         lty=pr$lty, lwd=pr$lwd, col.=pr$col, transparent=TRUE,
         arrow.factor=1, point.inc=NULL, opts=NULL, key.opts=NULL,
         empty.method=c('area','maxdim'), numbins=25, 
         pl=!missing(add), add=FALSE, 
         ylim=NULL, xlab="", ylab="",
         whichLabel=1:length(curves),
         grid=FALSE, xrestrict=NULL, ...)

putKey(z, labels, type, pch, lty, lwd, cex=par('cex'), col=rep(par('col'),nc), transparent=TRUE, plot=TRUE, key.opts=NULL, grid=FALSE)

putKeyEmpty(x, y, labels, type=NULL, pch=NULL, lty=NULL, lwd=NULL, cex=par('cex'), col=rep(par('col'),nc), transparent=TRUE, plot=TRUE, key.opts=NULL, empty.method=c('area','maxdim'), numbins=25, xlim=pr$usr[1:2], ylim=pr$usr[3:4], grid=FALSE)

drawPlot(..., xlim=c(0,1), ylim=c(0,1), xlab='', ylab='', ticks=c('none','x','y','xy'), key=FALSE, opts=NULL)

# Points(label=' ', type=c('p','r'), # n, pch=pch.to.use[1], cex=par('cex'), col=par('col'), # rug = c('none','x','y','xy'), ymean)

# Curve(label=' ', # type=c('bezier','polygon','linear','pol','loess','step','gauss'), # n=NULL, lty=1, lwd=par('lwd'), col=par('col'), degree=2, # evaluation=100, ask=FALSE)

# Abline(\dots)

# S3 method for drawPlot plot(x, xlab, ylab, ticks, key=x$key, keyloc=x$keyloc, ...)

largest.empty(x, y, width=0, height=0, numbins=25, method=c('exhaustive','rexhaustive','area','maxdim'), xlim=pr$usr[1:2], ylim=pr$usr[3:4], pl=FALSE, grid=FALSE)

Value

labcurve returns an invisible list with components x, y, offset, adj, cex, col, and if tilt=TRUE, angle. offset is the amount to add to y to draw a label. offset is negative if the label is drawn below the line. adj is a vector containing the values 0, .5, 1.

largest.empty returns a list with elements x and y

specifying the coordinates of the center of the rectangle which was found, and element rect containing the 4 x and y

coordinates of the corners of the found empty rectangle. The area of the rectangle is also returned.

Arguments

curves

a list of lists, each of which have at least two components: a vector of x values and a vector of corresponding y values. curves is mandatory except when method="mouse" or "locator", in which case labels is mandatory. Each list in curves may optionally have any of the parameters type, lty, lwd, or col for that curve, as defined below (see one of the last examples).

z

a two-element list specifying the coordinate of the center of the key, e.g. locator(1) to use the mouse for positioning

labels

For labcurve, a vector of character strings used to label curves (which may contain newline characters to stack labels vertically). The default labels are taken from the names of the curves list. Setting labels=FALSE will suppress drawing any labels (for labcurve only). For putKey and putKeyEmpty is a vector of character strings specifying group labels

x

y

for putKeyEmpty and largest.empty, x and y are same-length vectors specifying points that have been plotted. x can also be an object created by drawPlot.

...

For drawPlot is a series of invocations of Points and Curve (see example). Any number of point groups can be defined in this way. For Abline these may be any arguments to abline. For labcurve, other parameters to pass to text.

width

height

for largest.empty, specifies the minimum allowable width in x units and the minimum allowable height in y units

method

"offset" (the default) offsets labels at largest gaps between curves, and draws labels beside curves. "on top" draws labels on top of the curves (especially good when using keys). "arrow" draws arrows connecting labels to the curves. "mouse" or "locator" positions labels according to mouse clicks. If keys is specified and is an integer vector or is "lines", method defaults to "on top". If keys is character, method defaults to "offset". Set method="none" to suppress all curve labeling and key drawing, which is useful when pl=TRUE and you only need labcurve to draw the curves and the rest of the basic graph.

For largest.empty specifies the method a rectangle that does not collide with any of the (x, y) points. The default method, 'exhaustive', uses a Fortran translation of an R function and algorithm developed by Hans Borchers. The same result, more slowly, may be obtained by using pure R code by specifying method='rexhaustive'. The original algorithms using binning (and the only methods supported for S-Plus) are still available. For all methods, screening of candidate rectangles having at least a given width in x-units of width or having at least a given height in y-units of height is possible. Use method="area" to use the binning method to find the rectangle having the largest area, or method="maxdim" to use the binning method to return with last rectangle searched that had both the largest width and largest height over all previous rectangles.

keys

This causes keys (symbols or short text) to be drawn on or beside curves, and if keyloc is not equal to "none", a legend to be automatically drawn. The legend links keys with full curve labels and optionally with colors and line types. Set keys to a vector of character strings, or a vector of integers specifying plotting character (pch values - see points). For the latter case, the default behavior is to plot the symbols periodically, at equally spaced x-coordinates.

keyloc

When keys is specified, keyloc specifies how the legend is to be positioned for drawing using the key function in trellis. The default is "auto", for which the largest.empty function to used to find the most empty part of the plot. If no empty rectangle large enough to hold the key is found, no key will be drawn. Specify keyloc="none" to suppress drawing a legend, or set keyloc to a 2-element list containing the x and y coordinates for the center of the legend. For example, use keyloc=locator(1) to click the mouse at the center. keyloc specifies the coordinates of the center of the key to be drawn with plot.drawPlot when key=TRUE.

type

for labcurve, a scalar or vector of character strings specifying the method that the points in the curves were connected. "l" means ordinary connections between points and "s" means step functions. For putKey and putKeyEmpty is a vector of plotting types, "l" for regular line, "p" for point, "b" for both point and line, and "n" for none. For Points is either "p" (the default) for regular points, or "r" for rugplot (one-dimensional scatter diagram to be drawn using the scat1d function). For Curve, type is "bezier" (the default) for drawing a smooth Bezier curves (which can represent a non-1-to-1 function such as a circle), "polygon" for orginary line segments, "linear" for a straight line defined by two endpoints, "pol" for a degree-degree polynomial to be fitted to the mouse-clicked points, "step" for a left-step-function, "gauss" to plot a Gaussian density fitted to 3 clicked points, "loess" to use the lowess function to smooth the clicked points, or a function to draw a user-specified function, evaluated at evaluation points spanning the whole x-axis. For the density the user must click in the left tail, at the highest value (at the mean), and in the right tail, with the two tail values being approximately equidistant from the mean. The density is scaled to fit in the highest value regardless of its area.

step.type

type of step functions used (default is "left")

xmethod

method for generating the unique set of x-coordinates to examine (see above). Default is "grid" for type="l" or "unique" for type="s".

offset

distance in y-units between the center of the label and the line being labeled. Default is 0.75 times the height of an "m" that would be drawn in a label. For R grid/lattice you must specify offset using the grid unit function, e.g., offset=unit(2,"native") or offset=unit(.25,"cm") ("native" means data units)

xlim

limits for searching for label positions, and is also used to set up plots when pl=TRUE and add=FALSE. Default is total x-axis range for current plot (par("usr")[1:2]). For largest.empty, xlim limits the search for largest rectanges, but it has the same default as above. For pl=TRUE,add=FALSE you may want to extend xlim somewhat to allow large keys to fit, when using keyloc="auto". For drawPlot default is c(0,1). When using largest.empty with ggplot2, xlim and ylim are mandatory.

tilt

set to TRUE to tilt labels to follow the curves, for method="offset" when keys is not given.

window

width of a window, in x-units, to use in determining the local slope for tilting labels. Default is 0.5 times number of characters in the label times the x-width of an "m" in the current character size and font.

npts

number of points to use if xmethod="grid"

cex

character size to pass to text and key. Default is current par("cex"). For putKey, putKeyEmpty, and Points is the size of the plotting symbol.

adj

Default is "auto" which has labcurve figure justification automatically when method="offset". This will cause centering to be used when the local angle of the curve is less than angle.adj.auto in absolute value, left justification if the angle is larger and either the label is under a curve of positive slope or over a curve of negative slope, and right justification otherwise. For step functions, left justification is used when the label is above the curve and right justifcation otherwise. Set adj=.5 to center labels at computed coordinates. Set to 0 for left-justification, 1 for right. Set adj to a vector to vary adjustments over the curves.

angle.adj.auto

see adj. Does not apply to step functions.

lty

vector of line types which were used to draw the curves. This is only used when keys are drawn. If all of the line types, line widths, and line colors are the same, lines are not drawn in the key.

lwd

vector of line widths which were used to draw the curves. This is only used when keys are drawn. See lty also.

col.

col

vector of integer color numbers for use in curve labels, symbols, lines, and legends. Default is par("col") for all curves. See lty also.

transparent

Default is TRUE to make key draw transparent legends, i.e., to suppress drawing a solid rectangle background for the legend. Set to FALSE otherwise.

arrow.factor

factor by which to multiply default arrow lengths

point.inc

When keys is a vector of integers, point.inc specifies the x-increment between the point symbols that are overlaid periodically on the curves. By default, point.inc is equal to the range for the x-axis divided by 5.

opts

an optional list which can be used to specify any of the options to labcurve, with the usual element name abbreviations allowed. This is useful when labcurve is being called from another function. Example: opts=list(method="arrow", cex=.8, np=200). For drawPlot a list of labcurve options to pass as labcurve(..., opts=).

key.opts

a list of extra arguments you wish to pass to key(), e.g., key.opts=list(background=1, between=3). The argument names must be spelled out in full.

empty.method

numbins

These two arguments are passed to the largest.empty function's method and numbins arguments (see below). For largest.empty specifies the number of bins in which to discretize both the x and y directions for searching for rectangles. Default is 25.

pl

set to TRUE (or specify add) to cause the curves in curves to be drawn, under the control of type,lty,lwd,col parameters defined either in the curves lists or in the separate arguments given to labcurve or through opts. For largest.empty, set pl=TRUE to show the rectangle the function found by drawing it with a solid color. May not be used under ggplot2.

add

By default, when curves are actually drawn by labcurve a new plot is started. To add to an existing plot, set add=TRUE.

ylim

When a plot has already been started, ylim defaults to par("usr")[3:4]. When pl=TRUE, ylim and xlim are determined from the ranges of the data. Specify ylim yourself to take control of the plot construction. In some cases it is advisable to make ylim larger than usual to allow for automatically-positioned keys. For largest.empty, ylim specifies the limits on the y-axis to limit the search for rectangle. Here ylim defaults to the same as above, i.e., the range of the y-axis of an open plot from par. For drawPlot the default is c(0,1).

xlab

ylab

x-axis and y-axis labels when pl=TRUE and add=FALSE or for drawPlot. Defaults to "" unless the first curve has names for its first two elements, in which case the names of these elements are taken as xlab and ylab.

whichLabel

integer vector corresponding to curves specifying which curves are to be labelled or have a legend

grid

set to TRUE if the R grid package was used to draw the current plot. This prevents labcurve from using par("usr") etc. If using R grid you can pass coordinates and lengths having arbitrary units, as documented in the unit function. This is especially useful for offset.

xrestrict

When having labcurve label curves where they are most separated, you can restrict the search for this separation point to a range of the x-axis, specified as a 2-vector xrestrict. This is useful when one part of the curve is very steep. Even though steep regions may have maximum separation, the labels will collide when curves are steep.

pch

vector of plotting characters for putKey and putKeyEmpty. Can be any value including NA when only a line is used to indentify the group. Is a single plotting character for Points, with the default being the next unused value from among 1, 2, 3, 4, 16, 17, 5, 6, 15, 18, 19.

plot

set to FALSE to keep putKey or putKeyEmpty from actually drawing the key. Instead, the size of the key will be return by putKey, or the coordinates of the key by putKeyEmpty.

ticks

tells drawPlot which axes to draw tick marks and tick labels. Default is "none".

key

for drawPlot and plot.drawPlot. Default is FALSE so that labcurve is used to label points or curves. Set to TRUE to use putKeyEmpty.

Author

Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com

Details

The internal functions Points, Curve, Abline have unique arguments as follows.

label:

for Points and Curve is a single character string to label that group of points

n:

number of points to accept from the mouse. Default is to input points until a right mouse click.

rug:

for Points. Default is "none" to not show the marginal x or y distributions as rug plots, for the points entered. Other possibilities are used to execute scat1d to show the marginal distribution of x, y, or both as rug plots.

ymean:

for Points, subtracts a constant from each y-coordinate entered to make the overall mean ymean

degree:

degree of polynomial to fit to points by Curve

evaluation:

number of points at which to evaluate Bezier curves, polynomials, and other functions in Curve

ask:

set ask=TRUE to give the user the opportunity to try again at specifying points for Bezier curves, step functions, and polynomials

The labcurve function used some code from the function plot.multicurve written by Rod Tjoelker of The Boeing Company (tjoelker@espresso.rt.cs.boeing.com).

If there is only one curve, a label is placed at the middle x-value, and no fancy features such as angle or positive/negative offsets are used.

key is called once (with the argument plot=FALSE) to find the key dimensions. Then an empty rectangle with at least these dimensions is searched for using largest.empty. Then key is called again to draw the key there, using the argument corner=c(.5,.5) so that the center of the rectangle can be specified to key.

If you want to plot the data, an easier way to use labcurve is through xYplot as shown in some of its examples.

See Also

approx, text, legend, scat1d, xYplot, abline

Examples

Run this code
n <- 2:8
m <-  length(n)
type <- c('l','l','l','l','s','l','l')
# s=step function l=ordinary line (polygon)
curves <- vector('list', m)


plot(0,1,xlim=c(0,1),ylim=c(-2.5,4),type='n')


set.seed(39)


for(i in 1:m) {
  x <- sort(runif(n[i]))
  y <- rnorm(n[i])
  lines(x, y, lty=i, type=type[i], col=i)
  curves[[i]] <- list(x=x,y=y)
}


labels <- paste('Label for',letters[1:m])
labcurve(curves, labels, tilt=TRUE, type=type, col=1:m)


# Put only single letters on curves at points of 
# maximum space, and use key() to define the letters,
# with automatic positioning of the key in the most empty
# part of the plot
# Have labcurve do the plotting, leaving extra space for key


names(curves) <- labels
labcurve(curves, keys=letters[1:m], type=type, col=1:m,
         pl=TRUE, ylim=c(-2.5,4))


# Put plotting symbols at equally-spaced points,
# with a key for the symbols, ignoring line types


labcurve(curves, keys=1:m, lty=1, type=type, col=1:m,
         pl=TRUE, ylim=c(-2.5,4))




# Plot and label two curves, with line parameters specified with data
set.seed(191)
ages.f <- sort(rnorm(50,20,7))
ages.m <- sort(rnorm(40,19,7))
height.f <- pmin(ages.f,21)*.2+60
height.m <- pmin(ages.m,21)*.16+63


labcurve(list(Female=list(ages.f,height.f,col=2),
              Male  =list(ages.m,height.m,col=3,lty='dashed')),
         xlab='Age', ylab='Height', pl=TRUE)
# add ,keys=c('f','m') to label curves with single letters
# For S-Plus use lty=2


# Plot power for testing two proportions vs. n for various odds ratios, 
# using 0.1 as the probability of the event in the control group.  
# A separate curve is plotted for each odds ratio, and the curves are
# labeled at points of maximum separation


n  <- seq(10, 1000, by=10)
OR <- seq(.2,.9,by=.1)
pow <- lapply(OR, function(or,n)list(x=n,y=bpower(p1=.1,odds.ratio=or,n=n)),
              n=n)
names(pow) <- format(OR)
labcurve(pow, pl=TRUE, xlab='n', ylab='Power')


# Plot some random data and find the largest empty rectangle
# that is at least .1 wide and .1 tall


x <- runif(50)
y <- runif(50)
plot(x, y)
z <- largest.empty(x, y, .1, .1)
z
points(z,pch=3)  # mark center of rectangle, or
polygon(z$rect, col='blue')  # to draw the rectangle, or
#key(z$x, z$y, \dots stuff for legend)




# Use the mouse to draw a series of points using one symbol, and
# two smooth curves or straight lines (if two points are clicked), 
# none of these being labeled


# d <- drawPlot(Points(), Curve(), Curve())
# plot(d)


if (FALSE) {
# Use the mouse to draw a Gaussian density, two series of points
# using 2 symbols, one Bezier curve, a step function, and raw data
# along the x-axis as a 1-d scatter plot (rug plot).  Draw a key.
# The density function is fit to 3 mouse clicks
# Abline draws a dotted horizontal reference line
d <- drawPlot(Curve('Normal',type='gauss'),
              Points('female'), Points('male'), 
              Curve('smooth',ask=TRUE,lty=2), Curve('step',type='s',lty=3), 
              Points(type='r'), Abline(h=.5, lty=2),
              xlab='X', ylab='y', xlim=c(0,100), key=TRUE)
plot(d, ylab='Y')
plot(d, key=FALSE)  # label groups using labcurve
}

Run the code above in your browser using DataLab