bx
Uses the standard R boxplot function, boxplot
to display a boxplot in color. Also display the relevant statistics such as the hinges, median and IQR.
If the provided object to analyze is a set of multiple variables, including an entire data frame, then each non-numeric variable in the data frame is analyzed and the results written to a pdf file in the current working directory. The name of each output pdf file that contains a bar chart and its path are specified in the output.
When output is assigned into an object, such as b
in b <- bx(Y)
, the pieces of output can be accessed for later analysis. A primary such analysis is knitr
for dynamic report generation from an R Markdown file in which R output is embedded in documents, facilitated by the Rmd
option. See value
below.
BoxPlot(x=NULL, data=mydata, n.cat=getOption("n.cat"),
Rmd=NULL, col.fill=getOption("col.fill.bar"),
col.stroke=getOption("col.stroke.bar"),
col.bg=getOption("col.bg"),
col.grid=getOption("col.grid"),
cex.axis=0.75, col.axis="gray30",
xlab=NULL, main=NULL, sub=NULL, digits.d=NULL,
rotate.values=0, offset=0.5,
horiz=TRUE, add.points=FALSE,
quiet=getOption("quiet"),
pdf.file=NULL, pdf.width=5, pdf.height=5,
fun.call=NULL, ...)
bx(...)
c
mydata
.cex.axis.
offset
.FALSE
for vertical.TRUE
, then place a dot plot (i.e., stripchart) over the
box plot.TRUE
, no text output. Can change system default with
set
function..pdf
, the filetype is added to the name.knitr
to pass the function call when
obtained from the abbreviated function call bx
.boxplot
, the default here is for a horizontal boxplot. Also, BoxPlot
does not currently process in formula mode, so use the standard R boxplot
function to process a formula in which a boxplot is displayed for a variable at each level of a second, usually categorical, variable.Other graphic parameters are available to format the display, such as main
for the title, and other parameters found in boxplot
and par
. To minimize white space around the boxplot, re-size the graphics window before or after creating the boxplot.
DATA
The data may either be a vector from the global environment, the user's workspace, as illustrated in the examples below, or one or more variable's in a data frame, or a complete data frame. The default input data frame is mydata
. Can specify the source data frame name with the data
option. If multiple variables are specified, only the numerical variables in the list of variables are analyzed. The variables in the data frame are referenced directly by their names, that is, no need to invoke the standard R
mechanisms of the mydata$name
notation, the with
function or the attach
function. If the name of the vector in the global environment and of a variable in the input data frame are the same, the vector is analyzed.
To obtain a box plot of each numerical variable in the mydata
data frame, use BoxPlot()
. Or, for a data frame with a different name, insert the name between the parentheses. To analyze a subset of the variables in a data frame, specify the list with either a : or the c
function, such as m01:m03 or c(m01,m02,m03).
COLORS
Individual colors in the plot can be manipulated with options such as col.fill
for the color of the box. A color theme for all the colors can be chosen for a specific plot with the colors
option with the lessR
function set
. The default color theme is dodgerblue
, but a gray scale is available with "gray"
, and other themes are available as explained in set
, such as "red"
and "green"
. Use the option ghost=TRUE
for a black background, no grid lines and partial transparency of plotted colors.
VARIABLE LABELS
If variable labels exist, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read
.
PDF OUTPUT
Because of the customized graphic windowing system that maintains a unique graphic window for the Help function, the standard graphic output functions such as pdf
do not work with the lessR
graphics functions. Instead, to obtain pdf output, use the pdf.file
option, perhaps with the optional pdf.width
and pdf.height
options. These files are written to the default working directory, which can be explicitly specified with the R setwd
function.
ONLY VARIABLES ARE REFERENCED
The referenced variable in a lessR
function can only be a variable name. This referenced variable must exist in either the referenced data frame, such as the default mydata
, or in the user's workspace, more formally called the global environment. That is, expressions cannot be directly evaluated. For example:
> BoxPlot(rnorm(50)) # does NOT work}
Instead, do the following: > Y <- rnorm(50) # create vector Y in user workspace > BoxPlot(Y) # directly reference Y
R
object, otherwise it simply appears in the console. Redesigned in lessR
version 3.3 to provide two different types of components: the pieces of readable output, and a variety of statistics. The readable output are character strings such as tables amenable for reading. The statistics are numerical values amenable for further analysis. The motivation of these types of output is to facilitate R markdown documents, as the name of each piece, preceded by the name of the saved object and a $
, can be inserted into the Rmd
document (see examples
).
READABLE OUTPUT
code{out_stats}: Summary statistics for a box plot
code{out_outliers}: Outlier analysis
code{out_file}: Name and location of optional R markdown file
STATISTICS
code{n}: Number of data values analyzed
code{n.miss}: Number of missing data values
code{min}: Minimum
code{lower_whisker}: Lower whisker
code{lower_hinge}: Lower hinge
code{median}: Median
code{upper_hinge}: Upper hinge
code{upper_whisker}: Upper whisker
code{max}: Maximum
code{IQR}: Inter-quartile range
Although not typically needed, if the output is assigned to an object named, for example, h
, then the contents of the object can be viewed directly with the unclass
function, here as unclass(h)
.
[object Object],[object Object]
# ------------------------------ # box plot for a single variable # ------------------------------
# standard horizontal boxplot with all defaults BoxPlot(y)
# short name bx(y)
# save the box plot to a pdf file BoxPlot(y, pdf.file="MyBoxPlot.pdf")
# vertical boxplot with plum color BoxPlot(y, horiz=FALSE, col.fill="plum")
# box plot with outliers more strongly highlighted BoxPlot(y, col.stroke="red", xlab="My Variable")
# ------------------------------------------------ # box plots for data frames and multiple variables # ------------------------------------------------
# read internal lessR dataset # mydata contains both numeric and non-numeric data mydata <- rd("Employee", format="lessR", quiet=TRUE)
# box plot with superimposed dot plot (stripchart) BoxPlot(Salary, add.points=TRUE) # abbreviation bx(Salary)
# box plot with results saved to object b instead of displaying b <- BoxPlot(Salary) # show the results b # show just the piece regarding the statistics b$out_stats # list the names of all the components names(b)
# box plot with rotated axis values, offset more from axis BoxPlot(Salary, rotate.values=45, offset=1)
# BoxPlot generates R markdown file to be "knit" # such as in RStudio bx(Salary, Rmd="myout")
# box plots for all numeric variables in data frame called mydata BoxPlot()
# box plots for all numeric variables in data frame called mydata # with specified options BoxPlot(col.fill="palegreen1", col.stroke="plum")
# Use the subset function to specify a variable list
# box plots for all specified numeric variables
BoxPlot(c(Salary,Years))