Learn R Programming

BaBooN (version 0.2-0)

rowimpPrep: Missing-data pattern identifier

Description

‘rowimpPrep’ identifies all missingness patterns within an incomplete data set. Running rowimpPrep is a prerequisite for BBPMM.row.

Usage

rowimpPrep(data, ID=NULL, verbose=TRUE)

Arguments

data
Either a data frame or matrix with missing values.
ID
A numeric or character string vector indicating the column positions or names of the ID variable (if two data sets were stacked that have a joint subset of variables). The first element refers to the 'donor ID', the second element refers to the 'recipient ID'. This distinction is only of relevance, if the data set is 'L-shaped', i.e. if the data contains only one missing-data pattern (where incomplete cases are 'recipients'). If ID has only one element, The function assumes that the identifier variables of the two data sets are packed into a single variable. Default=NULL is used, if no ID variable is specified.
verbose
Prints information on identified missing-data patterns. Default=TRUE.

Value

data
The original data set minus the ID variable(s).
key
The ID variable(s) from the original data set.
blocks
A list containing the column positions of all identified missing-data patterns.
blockNames
A list containing the variable names corresponding to object blocks.
compNames
A character vector containing the variable names of the (completely observed) imputation model variables.
ignore
Contains positions of ignored variables.
ignored_data
Contains ignored variables.
indMatrix
A matrix with the same dimensions as the incomplete data containing flags for missing values.

Details

rowimpPrep identifies all patterns, and allows to decide, whether to impute all missing-data patterns with BBPMM.rowor just some of them. This comes in handy if variables that were assumed to be completely observed have missing values. These variables are then likely to define an unexpected 'block' of their own. Of course, BBPMM.row can be used to impute missing data that are not missing-by-design as well, but BBPMM would probably be the better option. Note that all variables listed in compNames are used for the imputation model in BBPMM.row, i.e. completely observed variables (ID variables aside) which are not to be used in the imputation model, have to be removed from the data set beforehand.

Examples

Run this code


### sample data set with non-normal variables and a single
### missingness pattern
set.seed(1000)
n <- 50
x1 <- round(runif(n,0.5,3.5))
x2 <- as.factor(c(rep(1,10),rep(2,25),rep(3,15)))
x3 <- round(rnorm(n,0,3))
y1 <- round(x1-0.25*(x2==2)+0.5*x3+rnorm(n,0,1))
y1 <- ifelse(y1<1,1,y1)
y1 <- ifelse(y1>4,5,y1)
y2 <- y1+rnorm(n,0,0.5)
y3 <- round(x3+rnorm(n,0,2))
data1 <- as.data.frame(cbind(x1,x2,x3,y1,y2,y3))
misrow1 <- sample(n,20)
is.na(data1[misrow1, c(4:6)]) <- TRUE

### preparation step
impblock <- rowimpPrep(data1)

impblock$blockNames

Run the code above in your browser using DataLab