This class is similar to a data.frame
but is customized for the situation in
which variables with missing data are being modeled for multiple imputation. This class primarily
consists of a list of missing_variable
s plus slots containing metadata indicating how the
missing_variable
s relate to each other. Most operations that work for a
data.frame
also work for a missing_data.frame.
missing_data.frame(y, ...)
## Hidden arguments not included in the signature
## favor_ordered = TRUE, favor_positive = FALSE,
## subclass = NA_character_,
## include_missingness = TRUE, skip_correlation_check = FALSE
Usually a data.frame
, possibly a numeric matrix,
possibly a list of missing_variable
s.
Hidden arguments. The favor_ordered
and favor_positive
arguments are passed to the missing_variable
function and are
documented under the type
argument. Briefly, they affect the heuristics
that are used to guess what class a variable should be coerced to. The
subclass
argument defaults to NA
and can be used to specify
that the resulting object should inherit from the missing_data.frame class
rather than be an object of missing_data.frame
class.
Any further arguments are passed to the initialize-methods
for
a missing_data.frame. They currently are include_missingness
, which
defaults to TRUE
and indicates that the missingness pattern of the other
variables should be included when modeling a particular missing_variable
,
and skip_correlation_check
, which defaults to FALSE and indicates whether
to skip the default check for whether the observed values of each pair of missing_variable
s
has a perfect absolute Spearman cor
relation.
The missing_data.frame
constructor function returns an object of class missing_data.frame
or that inherits from the missing_data.frame
class.
Objects can be created by calls of the form new("missing_data.frame", ...)
.
However, useRs almost always will pass a data.frame
to the
missing_data.frame constructor function to produce an object of missing_data.frame class.
This section is primarily aimed at developeRs. A missing_data.frame inherits from
data.frame
but has the following additional slots:
variables
:Object of class "list"
and each list element
is an object that inherits from the missing_variable-class
no_missing
:Object of class "logical"
, which is a vector
whose length is the same as the length of the variables slot indicating
whether the corresponding missing_variable
is fully observed
patterns
:Object of class factor
whose length is equal
to the number of observation and whose elements indicate the missingness pattern
for that observation
DIM
:Object of class "integer"
of length two indicating
first the number of observations and second the length of the variables
slot
DIMNAMES
:Object of class "list"
of length two providing
the appropriate number rownames and column names
postprocess
:Object of class "function"
used to create
additional variables from existing variables, such as interactions between
two missing_variable
s once their missing values have been
imputed. Does not work at the moment
index
:Object of class "list"
whose length is equal to
the number of missing_variable
s with some missing values. Each
list element is an integer vector indicating which columns of the X
slot must be dropped when modeling the corresponding missing_variable
X
:Object of MatrixTypeThing-class
with rows equal to the
number of observations and is loosely related to a model.matrix
. Rather
than repeatedly parsing a formula
during the multiple imputation process,
this X matrix is created once and some of its columns are dropped when
modeling a missing_variable
utilizing the index slot.
The columns of the X matrix consists of numeric representations of the
missing_variable
s plus (by default) the unique missingness patterns
weights
:Object of class "list"
whose length is equal to one
or the number of missing_variable
s with some missing values. Each
list element is passed to the corresponding argument of bayesglm
and similar functions. In particular, some observations can be given a weight
of zero, which should drop them when modeling some missing_variable
s
priors
:Object of class "list"
whose length is equal to the number
of missing_variable
s and whose elements give appropriate values for
the priors used by the model fitting function wraped by the fit_model-methods
;
see, e.g., bayesglm
correlations
:Object of class "matrix"
with rows and
columns equal to the length of the variables slot. Its strict upper
triangle contains Spearman cor
relations between pairs of
variables (ignoring missing values), and its strict lower triangle contains
Squared Multiple Correlations (SMCs) between a variable and all other
variables (ignoring missing values). If either a Spearman correlation or
a SMC is very close to unity, there may be difficulty or error messages
during the multiple imputation process.
done
:Object of class "logical"
of length one indicating
whether the missing values have been imputed
workpath
:Object of class character
of length one indicating
the path to a working directory that is used to store some objects
There are many methods that are defined for a missing_data.frame, although some are primarily intended for developers. The most relevant ones for users are:
signature(data = "missing_data.frame", y = "ANY", what = "character", to = "ANY")
which is used to change discretionary aspects of the missing_variable
s
in the variables slot of a missing_data.frame
signature(x = "missing_data.frame")
which shows histograms
of the observed variables that have missingness
signature(x = "missing_data.frame")
which plots
an image of the missingness slot to visualize the pattern of missingness
when grayscale = FALSE
or the pattern of missingness in light of the
observed values (grayscale = TRUE
, the default)
signature(y = "missing_data.frame", model = "missing")
which
multiply imputes the missing values
signature(object = "missing_data.frame")
which gives an overview
of the salient characteristics of the missing_variable
s in the
variables slot of a missing_data.frame
signature(object = "missing_data.frame")
which produces the same
result as the summary
method for a data.frame
There are also S3 methods for the dim
, dimnames
, and names
generics, which allow functions like nrow
, ncol
, rownames
,
colnames
, etc. to work as expected on missing_data.frame
s. Also, accessing
and changing elements for a missing_data.frame
mostly works the same way as for a
data.frame
In most cases, the first step of an analysis is for a useR to call the
missing_data.frame
function on a data.frame
whose variables
have some NA
values, which will call the missing_variable
function on each column of the data.frame
and return the list
that fills the variable slot. The classes of the list elements will depend on the
nature of the column of the data.frame
and various fallible heuristics. The
success rate can be enhanced by making sure that columns of the original
data.frame
that are intended to be categorical variables are
(ordered if appropriate) factor
s with labels. Even in the best case
scenario, it will often be necessary to utlize the change
function to
modify various discretionary aspects of the missing_variable
s in the
variables slot of the missing_data.frame. The show
method for
a missing_data.frame should be utilized to get a quick overview of the
missing_variable
s in a missing_data.frame and recognized what needs
to be change
d.
change
, missing_variable
, mi
,
experiment_missing_data.frame
, multilevel_missing_data.frame
# NOT RUN {
# STEP 0: Get data
data(CHAIN, package = "mi")
# STEP 1: Convert to a missing_data.frame
mdf <- missing_data.frame(CHAIN) # warnings about missingness patterns
show(mdf)
# STEP 2: change things
mdf <- change(mdf, y = "log_virus", what = "transformation", to = "identity")
# STEP 3: look deeper
summary(mdf)
hist(mdf)
image(mdf)
# STEP 4: impute
# }
# NOT RUN {
imputations <- mi(mdf)
# }
# NOT RUN {
## An example with subsetting on a fully observed variable
data(nlsyV, package = "mi")
mdfs <- missing_data.frame(nlsyV, favor_positive = TRUE, favor_ordered = FALSE, by = "first")
mdfs <- change(mdfs, y = "momed", what = "type", to = "ord")
show(mdfs)
# }
Run the code above in your browser using DataLab