Learn R Programming

VIM (version 3.0.2)

scattMiss: Scatterplot with information about missing/imputed values

Description

In addition to a standard scatterplot, lines are plotted for the missing values in one variable. If there are imputed values, they will be highlighted.

Usage

scattMiss(x, delimiter = NULL, side = 1, col = c("skyblue","red",
    "orange","lightgrey"), alpha = NULL, lty = c("dashed","dotted"),
    lwd = par("lwd"), quantiles = c(0.5, 0.975), inEllipse = FALSE,
    zeros = FALSE, xlim = NULL, ylim = NULL, main = NULL, sub = NULL,
    xlab = NULL, ylab = NULL, interactive = TRUE, ...)

Arguments

x
a matrix or data.frame with two columns.
delimiter
a character-vector to distinguish between variables and imputation-indices for imputed variables (therefore, x needs to have colnames). If given, it is used to determine the cor
side
if side=1, a rug representation and vertical lines are plotted for the missing/imputed values in the second variable; if side=2, a rug representation and horizontal lines for the missing/imputed values in the first
col
a vector of length four giving the colors to be used in the plot. The first color is used for the scatterplot, the second/third color for the rug representation for missing/imputed values. The second color is also used for the lines for mis
alpha
a numeric value between 0 and 1 giving the level of transparency of the colors, or NULL. This can be used to prevent overplotting.
lty
a vector of length two giving the line types for the lines and ellipses. If a single value is supplied, it will be used for both.
lwd
a vector of length two giving the line widths for the lines and ellipses. If a single value is supplied, it will be used for both.
quantiles
a vector giving the quantiles of the chi-square distribution to be used for the tolerance ellipses, or NULL to suppress plotting ellipses (see Details).
inEllipse
plot lines only inside the largest ellipse. Ignored if quantiles is NULL or if there are imputed values.
zeros
a logical vector of length two indicating whether the variables are semi-continuous, i.e., contain a considerable amount of zeros. If TRUE, only the non-zero observations are used for computing the tolerance ellipses. If a si
xlim, ylim
axis limits.
main, sub
main and sub title.
xlab, ylab
axis labels.
interactive
a logical indicating whether the side argument can be changed interactively (see Details).
...
further graphical parameters to be passed down (see par).

Details

Information about missing values in one variable is included as vertical or horizontal lines, as determined by the side argument. The lines are thereby drawn at the observed x- or y-value. In case of imputed values, they will additionally be highlighted in the scatterplot. Supplementary, percentage coverage ellipses can be drawn to give a clue about the shape of the bivariate data distribution. If interactiveis TRUE, clicking in the bottom margin redraws the plot with information about missing/imputed values in the first variable and clicking in the left margin redraws the plot with information about missing/imputed values in the second variable. Clicking anywhere else in the plot quits the interactive session.

References

M. Templ, A. Alfons, P. Filzmoser (2012) Exploring incomplete data using visualization tools. Journal of Advances in Data Analysis and Classification, Online first. DOI: 10.1007/s11634-011-0102-y.

See Also

marginplot

Examples

Run this code
data(tao, package = "VIM")
## for missing values
scattMiss(tao[,c("Air.Temp", "Humidity")])

## for imputed values
scattMiss(kNN(tao[,c("Air.Temp", "Humidity")]), delimiter = "_imp")

Run the code above in your browser using DataLab