assoc: Extended Association Plots

Description

Produce an association plot indicating deviations from a specified independence model in a possibly high-dimensional contingency table.

Usage

# S3 method for default
assoc(x, row_vars = NULL, col_vars = NULL, compress = TRUE,
  xlim = NULL, ylim = NULL,
  spacing = spacing_conditional(sp = 0), spacing_args = list(),
  split_vertical = NULL, keep_aspect_ratio = FALSE, 
  xscale = 0.9, yspace = unit(0.5, "lines"), main = NULL, sub = NULL,
  …, residuals_type = "Pearson", gp_axis = gpar(lty = 3))
# S3 method for formula
assoc(formula, data = NULL, …, subset = NULL, na.action = NULL, main = NULL, sub = NULL)

Arguments

a contingency table in array form with optional category labels specified in the dimnames(x) attribute, or an object inheriting from the "ftable" class (such as "structable" objects).

row_vars

a vector of integers giving the indices, or a character vector giving the names of the variables to be used for the rows of the association plot.

col_vars

a vector of integers giving the indices, or a character vector giving the names of the variables to be used for the columns of the association plot.

compress

logical; if FALSE, the space between the rows (columns) are chosen such that the total heights (widths) of the rows (columns) are all equal. If TRUE, the space between rows and columns is fixed and hence the plot is more “compressed”.

xlim

a $2 \times k$ matrix of doubles, $k$ number of total columns of the plot. The columns of xlim correspond to the columns of the association plot, the rows describe the column ranges (minimums in the first row, maximums in the second row). If xlim is NULL, the ranges are determined from the residuals according to compress (if TRUE: widest range from each column, if FALSE: from the whole association plot matrix).

ylim

a $2 \times k$ matrix of doubles, $k$ number of total rows of the plot. The columns of ylim correspond to the rows of the association plot, the rows describe the column ranges (minimums in the first row, maximums in the second row). If ylim is NULL, the ranges are determined from the residuals according to compress (if TRUE: widest range from each row, if FALSE: from the whole association plot matrix).

spacing

a spacing object, a spacing function, or a corresponding generating function (see strucplot for more information). The default is the spacing-generating function spacing_conditional that is (by default) called with the argument list spacing_args (see spacings for more details).

spacing_args

list of arguments for the spacing-generating function, if specified (see strucplot for more information).

split_vertical

vector of logicals of length $k$, where $k$ is the number of margins of x (default: FALSE). Values are recycled as needed. A TRUE component indicates that the corresponding dimension is folded into the columns, FALSE folds the dimension into the rows.

keep_aspect_ratio

logical indicating whether the aspect ratio should be fixed or not.

residuals_type

a character string indicating the type of residuals to be computed. Currently, only Pearson residuals are supported.

xscale

scale factor resizing the tile's width, thus adding additional space between the tiles.

yspace

object of class "unit" specifying additional space separating the rows.

gp_axis

object of class "gpar" specifying the visual aspects of the tiles' baseline.

formula

a formula object with possibly both left and right hand sides specifying the column and row variables of the flat table.

data

a data frame, list or environment containing the variables to be cross-tabulated, or an object inheriting from class table.

subset

an optional vector specifying a subset of observations to be used. Ignored if data is a contingency table.

na.action

an optional function which indicates what should happen when the data contain NAs. Ignored if data is a contingency table.

main, sub

either a logical, or a character string used for plotting the main (sub) title. If logical and TRUE, the name of the data object is used.

…

other parameters passed to strucplot

Value

The "structable" visualized is returned invisibly.

Details

Association plots have been suggested by Cohen (1980) and extended by Friendly (1992) and provide a means for visualizing the residuals of an independence model for a contingency table.

assoc is a generic function and currently has a default method and a formula interface. Both are high-level interfaces to the strucplot function, and produce (extended) association plots. Most of the functionality is described there, such as specification of the independence model, labeling, legend, spacing, shading, and other graphical parameters.

For a contingency table, the signed contribution to Pearson's $\chi^2$ for cell $\{ij\ldots k\}$ is

$$d_{ij\ldots k} = \frac{(f_{ij\ldots k} - e_{ij\ldots k})}{ \sqrt{e_{ij\ldots k}}}$$

where $f_{ij\ldots k}$ and $e_{ij\ldots k}$ are the observed and expected counts corresponding to the cell. In the association plot, each cell is represented by a rectangle that has (signed) height proportional to $d_{ij\ldots k}$ and width proportional to $\sqrt{e_{ij\ldots k}}$, so that the area of the box is proportional to the difference in observed and expected frequencies. The rectangles in each row are positioned relative to a baseline indicating independence ($d_{ij\ldots k} = 0$). If the observed frequency of a cell is greater than the expected one, the box rises above the baseline, and falls below otherwise.

Additionally, the residuals can be colored depending on a specified shading scheme (see Meyer et al., 2003). Package vcd offers a range of residual-based shadings (see the shadings help page). Some of them allow, e.g., the visualization of test statistics.

Unlike the assocplot function in the graphics package, this function allows the visualization of contingency tables with more than two dimensions. Similar to the construction of ‘flat’ tables (like objects of class "ftable" or "structable"), the dimensions are folded into rows and columns.

The layout is very flexible: the specification of shading, labeling, spacing, and legend is modularized (see strucplot for details).

References

Cohen, A. (1980), On the graphical display of the significant components in a two-way contingency table. Communications in Statistics---Theory and Methods, A9, 1025--1041.

Friendly, M. (1992), Graphical methods for categorical data. SAS User Group International Conference Proceedings, 17, 190--200. http://datavis.ca/papers/sugi/sugi17.pdf

Meyer, D., Zeileis, A., Hornik, K. (2003), Visualizing independence using extended association plots. Proceedings of the 3rd International Workshop on Distributed Statistical Computing, K. Hornik, F. Leisch, A. Zeileis (eds.), ISSN 1609-395X. https://www.R-project.org/conferences/DSC-2003/Proceedings/

Meyer, D., Zeileis, A., and Hornik, K. (2006), The strucplot framework: Visualizing multi-way contingency tables with vcd. Journal of Statistical Software, 17(3), 1-48. Available as vignette("strucplot", package = "vcd"). 10.18637/jss.v017.i03.

Examples

Run this code

# NOT RUN {
data("HairEyeColor")
## Aggregate over sex:
(x <- margin.table(HairEyeColor, c(1, 2)))

## Ordinary assocplot:
assoc(x)
## and with residual-based shading (of independence)
assoc(x, main = "Relation between hair and eye color", shade = TRUE)

## Aggregate over Eye color:
(x <- margin.table(HairEyeColor, c(1, 3)))
chisq.test(x)
assoc(x, main = "Relation between hair color and sex", shade = TRUE)

# Visualize multi-way table
assoc(aperm(HairEyeColor), expected = ~ (Hair + Eye) * Sex,
      labeling_args = list(just_labels = c(Eye = "left"),
                           offset_labels = c(right = -0.5),
                           offset_varnames = c(right = 1.2),
                           rot_labels = c(right = 0),
                           tl_varnames = c(Eye = TRUE))
)

assoc(aperm(UCBAdmissions), expected = ~ (Admit + Gender) * Dept, compress = FALSE,
      labeling_args = list(abbreviate = c(Gender = TRUE), rot_labels = 0)
)
# }

Run the code above in your browser using DataLab