Density-Box-Plot: Density-Box-Plots

Description

This function draws a (grouped) boxplot-like plot with with kernel density estimators.

Usage

densbox(formula, data, rug = FALSE, from, to, gsep = .5, kernel, bw, main, ylab,
    var_names, box_out = TRUE, horizontal = FALSE, ...)

Arguments

formula

a formula object that references elements in data, see Details

data

a data frame containing the variables specified in formula

rug

a logical value to add a rug to the individual density-boxes

from

an optional lower boundary for the kernel density estimation (see density)

an optional upper boundary for the kernel density estimation (see density)

gsep

a numeric value $\geq0$ that specifies the length of group separation if two or more grouping variables are used

kernel

a string specifying the type of the kernel (default: "gaussian", see density)

the bandwidth for kernel density estimation (see density)

main

a character object for the title

ylab

a character object for the $y$-axis label

var_names

a character object to print grouping variables' names in the lower left margin -- grouping variables are treated in the order they are given in the formula

box_out

if TRUE, outliers treated as in standard boxplots (plotted as stars outside the boxplot's whiskers; default), if FALSE, outliers are not treated differently, i.e., minimum and maximum will be over the full range, no matter how fa

horizontal

not implemented yet...

...

further arguments, see Details

encoding

UTF-8

Details

This function plots a combination of boxplots and kernel density plots to get a more informative graphic of a metric dependent variable with respect to grouped data. The central element is the formula argument that defines the dependent variable (dv) and grouping variables (independent variables, iv). For a meaningful plot, the ivs should be categorical variables (they are treated as factors).

In the simplest case, there is no grouping, so formula is DV ~ 1. As grouping variables are added, the plot will be split up accordingly. Note that the ordering of ivs in the formula defines how the plot is split up -- the first variable is the most general grouping, the second will form subgroups in the first variable's groups and so on ...If there are cases where a level of a factor is completely missing ab initio, the level will be dropped. Subgroups with less than 5 observations will be dropped and $<5$< dquote=""> will be plotted instead.

Examples

Run this code

# plot a density-box-plot of one (log-normal) variable
set.seed(5L)
data1 <- rlnorm(100, 1, .5)
densbox(data1 ~ 1, from = 0, rug = TRUE)

# plots a continuous variable in (0, 1) with 2 grouping variables
data2 <- data.frame(y  = rnorm(400, rep(c(0, 1, -1, 0), each = 100), 1),
                    x1 = rep(c("A", "B"), each = 200),
                    x2 = rep(c("X", "Y", "X", "Y"), each = 100))
with(data2, tapply(y, list(x1, x2), mean))

# a density-box-plot of the data with the kernel density
# estimator constrained to the interval 0 to 1
densbox(y ~ x2 + x1, data2, main = "Plot with some
Specials",
  var_names = c("Second
Variable", "First Variable"))

# the same plot with a rug and ignoring outliers in the boxplot
densbox(y ~ x2 + x1, data2, rug = TRUE, box_out = FALSE)

# density-box-plot with the same data, but no additional space between groups
# by setting gsep = 0.
# the kernel density plots have a triangular kernel with a bandwidth of 0.25
# which results in a "jagged" appearance.
densbox(y ~ x2 + x1, data2, gsep = 0, kernel = "rectangular", bw = 0.25)

Run the code above in your browser using DataLab