prelim.cat: Preliminary manipulations on incomplete categorical data

Description

This function performs grouping and sorting operations on categorical datasets with missing values. It creates a list that is needed for input to em.cat, da.cat, imp.cat, etc.

Usage

prelim.cat(x, counts, levs)

Value

a list of seventeen components that summarize various features of x after the data have been sorted by missingness patterns and grouped according to the observed values. Components that might be of interest to the user include:

nmis: a vector of length ncol(x) containing the number of missing values for each variable in x.
r: matrix of response indicators showing the missing data patterns in x. Dimension is (m,p) where m is number of distinct missingness patterns in the rows of x, and p is the number of columns in x. Observed values are indicated by 1 and missing values by 0. The row names give the number of observations in each pattern, and the columns correspond to the columns of x.
d: vector of length ncol(x) indicating the number of levels for each variable. The complete-data contingency table would be an array with these dimensions. Identical to levs if levs was supplied.
ncells: number of cells in the cross-classified contingency table, equal to prod(d).

Arguments

x: categorical data matrix containing missing values. The data may be provided either in ungrouped or grouped format. In ungrouped format, the rows of x correspond to individual observational units, so that nrow(x) is the total sample size. In grouped format, the rows of x correspond to distinct covariate patterns; the frequencies are provided through the counts argument. In either format, the columns correspond to variables. The categories must be coded as consecutive positive integers beginning with 1 (1,2,...), and missing values are denoted by NA.
counts: optional vector of length nrow(x) giving the frequencies corresponding to the covariate patterns in x. The total sample size is sum(counts). If counts is missing, the data are assumed to be ungrouped; this is equivalent to taking counts equal to rep(1,nrow(x)).
levs: optional vector of length ncol(x) indicating the number of levels for each categorical variable. If missing, levs[j] is taken to be max(x[,j],na.rm=T).

References

Chapters 7--8 of Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman & Hall.

Examples

Run this code

data(crimes)
crimes
s <- prelim.cat(crimes[,1:2],crimes[,3])   # preliminary manipulations
s$nmis                      # see number of missing observations per variable
s$r                         # look at missing data patterns