Learn R Programming

Amelia (version 1.8.1)

overimpute: Overimputation diagnostic plot

Description

Treats each observed value as missing and imputes from the imputation model from amelia output.

Usage

overimpute(
  output,
  var,
  draws = 20,
  subset,
  legend = TRUE,
  xlab,
  ylab,
  main,
  frontend = FALSE,
  ...
)

Value

A list that contains (1) the row in the original data (row), (2) the observed value of that observation (orig), (2) the mean of the overimputations (mean.overimputed), (3) the lower bound of the 95% confidence interval of the overimputations (lower.overimputed), (4) the upper bound of the 95% confidence interval of the overimputations (upper.overimputed), (5) the fraction of the variables that were missing for that observation in the original data (prcntmiss), and (6) a matrix of the raw overimputations, with observations in rows and the different draws in columns (overimps).

Arguments

output

output from the function amelia.

var

column number or variable name of the variable to overimpute.

draws

the number of draws per imputed dataset to generate overimputations. Total number of simulations will m * draws where m is the number of imputations.

subset

an optional vector specifying a subset of observations to be used in the overimputation.

legend

a logical value indicating if a legend should be plotted.

xlab

the label for the x-axis. The default is "Observed Values."

ylab

the label for the y-axis. The default is "Imputed Values."

main

main title of the plot. The default is to smartly title the plot using the variable name.

frontend

a logical value used internally for the Amelia GUI.

...

further graphical parameters for the plot.

Details

This function temporarily treats each observed value in var as missing and imputes that value based on the imputation model of output. The dots are the mean imputation and the vertical lines are the 90% percent confidence intervals for imputations of each observed value. The diagonal line is the \(y=x\) line. If all of the imputations were perfect, then our points would all fall on the line. A good imputation model would have about 90% of the confidence intervals containing the truth; that is, about 90% of the vertical lines should cross the diagonal.

The color of the vertical lines displays the fraction of missing observations in the pattern of missingness for that observation. The legend codes this information. Obviously, the imputations will be much tighter if there are more observed covariates to use to impute that observation.

The subset argument evaluates in the environment of the data. That is, it can but is not required to refer to variables in the data frame as if it were attached.

See Also

Other imputation diagnostics are compare.density, disperse, and tscsPlot.