Produce an association plot indicating deviations from a specified independence model in a possibly high-dimensional contingency table.
# S3 method for default
assoc(x, row_vars = NULL, col_vars = NULL, compress = TRUE,
xlim = NULL, ylim = NULL,
spacing = spacing_conditional(sp = 0), spacing_args = list(),
split_vertical = NULL, keep_aspect_ratio = FALSE,
xscale = 0.9, yspace = unit(0.5, "lines"), main = NULL, sub = NULL,
…, residuals_type = "Pearson", gp_axis = gpar(lty = 3))
# S3 method for formula
assoc(formula, data = NULL, …, subset = NULL, na.action = NULL, main = NULL, sub = NULL)
a contingency table in array form with optional category
labels specified in the dimnames(x)
attribute, or an object
inheriting from the "ftable"
class (such as
"structable"
objects).
a vector of integers giving the indices, or a character vector giving the names of the variables to be used for the rows of the association plot.
a vector of integers giving the indices, or a character vector giving the names of the variables to be used for the columns of the association plot.
logical; if FALSE
, the space between the rows
(columns) are chosen such that the total heights (widths) of
the rows (columns) are all equal. If TRUE
, the space between
rows and columns is fixed and hence the plot is more
“compressed”.
a \(2 \times k\) matrix of doubles, \(k\)
number of total columns of the plot. The columns of xlim
correspond to the columns of the association plot, the rows describe
the column ranges (minimums in the first row, maximums in the second
row). If xlim
is NULL
, the ranges are determined from
the residuals according to compress
(if TRUE
: widest
range from each column, if FALSE
: from the whole association
plot matrix).
a \(2 \times k\) matrix of doubles, \(k\)
number of total rows of the plot. The columns of ylim
correspond to the rows of the association plot, the rows describe
the column ranges (minimums in the first row, maximums in the second
row). If ylim
is NULL
, the ranges are determined from
the residuals according to compress
(if TRUE
: widest
range from each row, if FALSE
: from the whole association
plot matrix).
a spacing object, a spacing function, or a
corresponding generating function (see strucplot
for
more information). The default is the spacing-generating function
spacing_conditional
that is (by default) called with the
argument list spacing_args
(see spacings
for more details).
list of arguments for the spacing-generating function, if
specified (see strucplot
for more information).
vector of logicals of length \(k\), where \(k\)
is the number of margins of x
(default: FALSE
).
Values are recycled as needed.
A TRUE
component indicates that the corresponding dimension
is folded into the columns, FALSE
folds the dimension into the
rows.
logical indicating whether the aspect ratio should be fixed or not.
a character string indicating the type of residuals to be computed. Currently, only Pearson residuals are supported.
scale factor resizing the tile's width, thus adding additional space between the tiles.
object of class "unit"
specifying additional
space separating the rows.
object of class "gpar"
specifying the visual
aspects of the tiles' baseline.
a formula object with possibly both left and right hand sides specifying the column and row variables of the flat table.
a data frame, list or environment containing the variables
to be cross-tabulated, or an object inheriting from class table
.
an optional vector specifying a subset of observations
to be used. Ignored if data
is a contingency table.
an optional function which indicates what should happen when
the data contain NA
s. Ignored if data
is a contingency table.
either a logical, or a character string used for plotting
the main (sub) title. If logical and TRUE
, the
name of the data
object is used.
other parameters passed to strucplot
The "structable"
visualized is returned invisibly.
Association plots have been suggested by Cohen (1980) and extended by Friendly (1992) and provide a means for visualizing the residuals of an independence model for a contingency table.
assoc
is a generic function and currently has a default method and a
formula interface. Both are high-level interfaces to the
strucplot
function, and produce (extended) association
plots. Most of the functionality is described there, such as
specification of the independence model, labeling, legend, spacing,
shading, and other graphical parameters.
For a contingency table, the signed contribution to Pearson's \(\chi^2\) for cell \(\{ij\ldots k\}\) is
$$d_{ij\ldots k} = \frac{(f_{ij\ldots k} - e_{ij\ldots k})}{ \sqrt{e_{ij\ldots k}}}$$
where \(f_{ij\ldots k}\) and \(e_{ij\ldots k}\) are the observed and expected counts corresponding to the cell. In the association plot, each cell is represented by a rectangle that has (signed) height proportional to \(d_{ij\ldots k}\) and width proportional to \(\sqrt{e_{ij\ldots k}}\), so that the area of the box is proportional to the difference in observed and expected frequencies. The rectangles in each row are positioned relative to a baseline indicating independence (\(d_{ij\ldots k} = 0\)). If the observed frequency of a cell is greater than the expected one, the box rises above the baseline, and falls below otherwise.
Additionally, the residuals can be colored depending on a specified shading scheme (see Meyer et al., 2003). Package vcd offers a range of residual-based shadings (see the shadings help page). Some of them allow, e.g., the visualization of test statistics.
Unlike the assocplot
function in the
graphics package, this function allows the visualization of
contingency tables with more than two dimensions. Similar to the
construction of ‘flat’ tables (like objects of class "ftable"
or
"structable"
), the dimensions are folded into rows and columns.
The layout is very flexible: the specification of shading, labeling,
spacing, and legend is modularized (see strucplot
for
details).
Cohen, A. (1980), On the graphical display of the significant components in a two-way contingency table. Communications in Statistics---Theory and Methods, A9, 1025--1041.
Friendly, M. (1992), Graphical methods for categorical data. SAS User Group International Conference Proceedings, 17, 190--200. http://datavis.ca/papers/sugi/sugi17.pdf
Meyer, D., Zeileis, A., Hornik, K. (2003), Visualizing independence using extended association plots. Proceedings of the 3rd International Workshop on Distributed Statistical Computing, K. Hornik, F. Leisch, A. Zeileis (eds.), ISSN 1609-395X. https://www.R-project.org/conferences/DSC-2003/Proceedings/
Meyer, D., Zeileis, A., and Hornik, K. (2006),
The strucplot framework: Visualizing multi-way contingency tables with
vcd.
Journal of Statistical Software, 17(3), 1-48.
URL http://www.jstatsoft.org/v17/i03/ and available as
vignette("strucplot")
.
# NOT RUN {
data("HairEyeColor")
## Aggregate over sex:
(x <- margin.table(HairEyeColor, c(1, 2)))
## Ordinary assocplot:
assoc(x)
## and with residual-based shading (of independence)
assoc(x, main = "Relation between hair and eye color", shade = TRUE)
## Aggregate over Eye color:
(x <- margin.table(HairEyeColor, c(1, 3)))
chisq.test(x)
assoc(x, main = "Relation between hair color and sex", shade = TRUE)
# Visualize multi-way table
assoc(aperm(HairEyeColor), expected = ~ (Hair + Eye) * Sex,
labeling_args = list(just_labels = c(Eye = "left"),
offset_labels = c(right = -0.5),
offset_varnames = c(right = 1.2),
rot_labels = c(right = 0),
tl_varnames = c(Eye = TRUE))
)
assoc(aperm(UCBAdmissions), expected = ~ (Admit + Gender) * Dept, compress = FALSE,
labeling_args = list(abbreviate = c(Gender = TRUE), rot_labels = 0)
)
# }
Run the code above in your browser using DataLab