Learn R Programming

lessR (version 2.4.2)

BoxPlot: Boxplot

Description

Abbreviation: bx

Uses the standard R boxplot function, boxplot to display a boxplot in color. Also display the relevant statistics such as the hinges, median and IQR.

If the provided object for which to calculate the box plot is a data frame, then a box plot is calculated for each numeric variable in the data frame and the results written to a pdf file in the current working directory. The name of this file and its path are specified in the output.

Usage

BoxPlot(x=NULL, dframe=mydata, n.cat=getOption("n.cat"), text.out=TRUE, ...)

## S3 method for class 'data.frame': bx(x, n.cat, text.out, \ldots)

## S3 method for class 'default': bx(x, col.box=NULL, col.pts=NULL, col.bg=NULL, col.grid=NULL, colors=c("blue", "gray", "rose", "green", "gold", "red"), cex.axis=.85, col.axis="gray30", col.ticks="gray30", horiz=TRUE, dotplot=FALSE, xlab=NULL, main=NULL, digits.d=NULL, text.out=TRUE, pdf.file=NULL, pdf.width=5, pdf.height=5, ...)

bx(...)

Arguments

x
Variable for which to construct the histogram. Can be a data frame. If not specified with dframe, that is, no variable specified, then the data frame mydata is assumed.
dframe
Optional data frame that contains the variables of interest, default is mydata.
n.cat
When analyzing all the variables in a data frame, specifies the largest number of unique values of variable of a numeric data type for which the variable will be analyzed as a categorical. Set to 0 to turn off.
col.box
Color of the box.
col.pts
Color of any points that designate outliers. By default this is the same color as the box.
col.bg
Color of the plot background.
col.grid
Color of the grid lines.
colors
Sets the color palette.
cex.axis
Scale magnification factor, which by defaults displays the axis values to be smaller than the axis labels. Provides the functionality of, and can be replaced by, the standard R cex.axis.
col.axis
Color of the font used to label the axis values.
col.ticks
Color of the ticks used to label the axis values.
horiz
Orientation of the boxplot. Set FALSE for vertical.
dotplot
If TRUE, then place a dot plot (i.e., stripchart) over the box plot.
xlab
Label for the value axis, which defaults to the variable's name.
main
Title of graph.
digits.d
Number of decimal digits displayed in the listing of the summary statistics.
text.out
If TRUE, then display text output in console.
pdf.file
Name of the pdf file to which graphics are redirected.
pdf.width
Width of the pdf file in inches.
pdf.height
Height of the pdf file in inches.
...
Other parameter values for graphics as defined processed by boxplot and par, including ylim to set the limits of the value axis,

Details

OVERVIEW Unlike the standard R boxplot function, boxplot, the default here is for a horizontal boxplot. Also, BoxPlot does not currently process in formula mode, so use the standard R boxplot function to process a formula in which a boxplot is displayed for a variable at each level of a second, usually categorical, variable.

Other graphic parameters are available to format the display, such as main for the title, and other parameters found in boxplot and par. To minimize white space around the boxplot, re-size the graphics window before or after creating the boxplot. den.

DATA If the variable is in a data frame, the input data frame has the assumed name of mydata. If this data frame is named something different, then specify the name with the dframe option. Regardless of its name, the data frame need not be attached to reference the variable directly by its name, that is, no need to invoke the mydata$name notation.

To obtain a box plot of each numerical variable in the mydata data frame, use BoxPlot(). Or, for a data frame with a different name, insert the name between the parentheses.

COLOR THEME Individual colors in the plot can be manipulated with options such as col.box for the color of the box. A color theme for all the colors can be chosen for a specific plot with the colors option. Or, the color theme can be changed for all subsequent graphical analysis with the lessR function set. The default color theme is blue, but a gray scale is available with "gray", and other themes are available as explained in set.

VARIABLE LABELS Although standard R does not provide for variable labels, lessR can store the labels in a data frame called mylabels, obtained from the Read function. If this labels data frame exists, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read.

PDF OUTPUT Because of the customized graphic windowing system that maintains a unique graphic window for the Help function, the standard graphic output functions such as pdf do not work with the lessR graphics functions. Instead, to obtain pdf output, use the pdf.file option, perhaps with the optional pdf.width and pdf.height options. These files are written to the default working directory, which can be explicitly specified with the R setwd function.

ONLY VARIABLES ARE REFERENCED The referenced variable in a lessR function can only be a variable name. This referenced variable must exist in either the referenced data frame, mydata by default, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:

> BoxPlot(rnorm(50)) # does NOT work}

Instead, do the following: > Y <- rnorm(50) # create vector Y in user workspace > BoxPlot(Y) # directly reference Y

[object Object],[object Object]

boxplot, par, set.

# simulate data and get at least one outlier y <- rnorm(100,50,10) y[1] <- 90

# ----------------------------- # boxplot for a single variable # -----------------------------

# standard horizontal boxplot with all defaults BoxPlot(y)

# short name bx(y)

# save the box plot to a pdf file BoxPlot(y, pdf.file="MyBoxPlot.pdf")

# vertical boxplot with plum color BoxPlot(y, horiz=FALSE, col.box="plum")

# boxplot with outliers more strongly highlighted BoxPlot(y, col.pts="red", xlab="My Variable")

# ----------------------------------------------- # boxplots for data frames and multiple variables # -----------------------------------------------

# create data frame, mydata, to mimic reading data with rad function # mydata contains both numeric and non-numeric data mydata <- data.frame(rnorm(100), rnorm(100), rnorm(100), rep(c("A","B"),50)) names(mydata) <- c("X","Y","Z","C")

# boxplot for variable X from data frame, referred to directly BoxPlot(X)

# boxplot with superimposed dot plot (stripchart) BoxPlot(X, dotplot=TRUE)

# boxplots for all numeric variables in data frame called mydata BoxPlot()

# boxplots for all numeric variables in data frame called mydata # with specified options BoxPlot(col.box="palegreen1", col.pts="plum")

# Use the subset function to specify a variable list mysub <- subset(mydata, select=c(X,Y)) BoxPlot(dframe=mysub)boxplot