impute.svd: Missing value imputation via the SVDImpute algorithm

Description

Given a matrix with missing values, impute the missing entries using a low-rank SVD approximation estimated by the EM algorithm.

Usage

impute.svd(x, k = min(n, p), tol = max(n, p) * 1e-10, maxiter = 100)

Arguments

a matrix to impute the missing entries of.

the rank of the SVD approximation.

tol

the convergence tolerance for the EM algorithm.

maxiter

the maximum number of EM steps to take.

Value

the completed version of the matrix.

rss

the sum of squares between the SVD approximation and the non-missing values in x.

iter

the number of EM iterations before algorithm stopped.

Details

Impute the missing values of x as follows: First, initialize all NA values to the column means, or 0 if all entries in the column are missing. Then, until convergence, compute the first k terms of the SVD of the completed matrix. Replace the previously missing values with their approximations from the SVD, and compute the RSS between the non-missing values and the SVD.

Declare convergence if abs(rss0 - rss1) / (.Machine$double.eps + rss1) < tol, where rss0 and rss1 are the RSS values computed from successive iterations. Stop early after maxiter iterations and issue a warning.

References

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520--525.

Examples

Run this code

# NOT RUN {
  # Generate a matrix with missing entries    
  n <- 20
  p <- 10
  u <- rnorm( n )
  v <- rnorm( p )
  xfull <- u %*% rbind( v ) + rnorm( n*p )
  miss  <- sample( seq_len( n*p ), n )
  x       <- xfull
  x[miss] <- NA
      
  # impute the missing entries with a rank-1 SVD approximation
  xhat <- impute.svd( x, 1 )$x   
  
  # compute the prediction error for the missing entries
  sum( ( xfull-xhat )^2 )
# }

Run the code above in your browser using DataLab