Given a square, symmetric matrix (such as a correlation matrix) this function tries to drop the fewest possible number of variables to return a (square, symmetric) matrix with no missing cells.
corOK(x, maxiter = 100)
A list with two elements
The complete non missing matrix.
A vector of the columns and rows from the original matrix to be kept (i.e., that are nonmissing).
a square, symmetric matrix or object coercable to such (such as a data frame).
a number indicating the maximum number of iterations, currently as a sanity check. See details.
The assumption that x is square and symmetric comes because it is
assumed that the number of missing cells for a given column are identical
to that of the corresponding row. corOK
finds the column with the
most missing values, and drops that (and its corresponding row), and continues
on in like manner until the matrix has no missing values. Although this was
intended for a correlation matrix, it could be used on other types of matrices.
Note that because corOK
uses an iterative method, it can be slow when many
columns/rows need to be removed. For the intended use (correlation matrices) there
probably should not be many missing. As a sanity check and to prevent tediously long
computations, the maximum number of iterations can be set.
cormat <- cor(iris[, -5])
# set missing
cormat[cbind(c(1,2), c(2,1))] <- NA
# print
cormat
# return complete
corOK(cormat)
# using maximum iterations
corOK(cormat, maxiter=0)
# clean up
rm(cormat)
Run the code above in your browser using DataLab