This function performs grouping and sorting operations on categorical datasets with missing values. It creates a list that is needed for input to em.cat, da.cat, imp.cat, etc.
prelim.cat(x, counts, levs)
a list of seventeen components that summarize various features of x after the data have been sorted by missingness patterns and grouped according to the observed values. Components that might be of interest to the user include:
a vector of length ncol(x)
containing the number of missing values
for each variable in x.
matrix of response indicators showing the missing data patterns in x. Dimension is (m,p) where m is number of distinct missingness patterns in the rows of x, and p is the number of columns in x. Observed values are indicated by 1 and missing values by 0. The row names give the number of observations in each pattern, and the columns correspond to the columns of x.
vector of length ncol(x)
indicating the number of levels for each
variable. The complete-data contingency table would be an array with
these dimensions. Identical to levs
if levs
was supplied.
number of cells in the cross-classified contingency table, equal to
prod(d)
.
categorical data matrix containing missing values. The data may be
provided either in ungrouped or grouped format. In ungrouped format,
the rows of x correspond to individual observational units, so that
nrow(x) is the total sample size. In grouped format, the rows of x
correspond to distinct covariate patterns; the frequencies are
provided through the counts
argument. In either format, the columns
correspond to variables. The categories must be coded as consecutive
positive integers beginning with 1 (1,2,...), and missing values are
denoted by NA
.
optional vector of length nrow(x)
giving the frequencies corresponding
to the covariate patterns in x. The total sample size is
sum(counts)
. If counts
is missing, the data are assumed to be
ungrouped; this is equivalent to taking counts
equal to
rep(1,nrow(x))
.
optional vector of length ncol(x)
indicating the number of levels
for each categorical variable. If missing, levs[j]
is taken to be
max(x[,j],na.rm=T)
.
Chapters 7--8 of Schafer (1996) Analysis of Incomplete Multivariate Data. Chapman & Hall.
em.cat
, ecm.cat
, da.cat
,mda.cat
, dabipf
, imp.cat
data(crimes)
crimes
s <- prelim.cat(crimes[,1:2],crimes[,3]) # preliminary manipulations
s$nmis # see number of missing observations per variable
s$r # look at missing data patterns
Run the code above in your browser using DataLab