flux: Influx and outflux of multivatiate missing data patterns

Description

Influx and outflux are statistics of the missing data pattern. These statistics are useful in selecting predictors that should go into the imputation model.

Usage

flux(data, local = names(data))
fluxplot(data, local = names(data),
                 plot = TRUE, labels = TRUE,
                 xlim = c(0,1), ylim = c(0,1), las = 1,
                 xlab = "Influx", ylab = "Outflux",
                 main=paste("Influx-outflux pattern for", deparse(substitute(data))),
                 eqscplot = TRUE, pty= "s" ,
                 ...)
fico(data)

Arguments

data

A data frame or a matrix containing the incomplete data. Missing values are coded as NA's.

local

A vector of names of columns of data. The default is to include all columns in the calculations.

plot

Should a graph be produced?

labels

Should the points be labeled?

xlim

See par.

ylim

See par.

las

See par.

xlab

See par.

ylab

See par.

main

See par.

eqscplot

Should a square plot be produced?

pty

See par.

...

Further arguments passed to plot() or eqscplot().

Value

flux() and returns a data frame with ncol(data) rows and six columns:
pobsProportion observed
influxInflux
outfluxOutflux
ainbAverage inbound statistic
aoutAverege outbound statistic
ficoFraction of incomplete cases among cases with Yj observed
.
fluxplot() returns the same result, but invisible.
fico() returns a vector of length ncol(data) of FICO statistics.

Details

Infux and outflux have been proposed by Van Buuren (2012), chapter 4.

Influx is equal to the number of variable pairs (Yj , Yk) with Yj missing and Yk observed, divided by the total number of observed data cells. Influx depends on the proportion of missing data of the variable. Influx of a completely observed variable is equal to 0, whereas for completely missing variables wehave influx = 1. For two variables with the same proportion of missing data, the variable with higher influx is better connected to the observed data, and might thus be easier to impute.

Outflux is equal to the number of variable pairs with Yj observed and Yk missing, divided by the total number of incomplete data cells. Outflux is an indicator of the potential usefulness of Yj for imputing other variables. Outflux depends on the proportion of missing data of the variable. Outflux of a completely observed variable is equal to 1, whereas outflux of a completely missing variable is equal to 0. For two variables having the same proportion of missing data, the variable with higher outflux is better connected to the missing data, and thus potentially more useful for imputing other variables.

FICO is an outbound statistic defined by the fraction of incomplete cases among cases with Yj observed (White and Carlin, 2010).

References

van Buuren, S. (2012). Flexible Imputation of Missing Data. Boca Raton, FL: Chapman & Hall/CRC Press.

White, I.R., Carlin, J.B. (2010). Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Statistics in Medicine, 29, 2920-2931.