Learn R Programming

bcv (version 1.0.1)

impute.svd: Missing value imputation via the SVDImpute algorithm

Description

Given a matrix with missing values, impute the missing entries using a low-rank SVD approximation estimated by the EM algorithm.

Usage

impute.svd(x, k = min(n, p), tol = max(n, p) * 1e-10, maxiter = 100)

Arguments

x

a matrix to impute the missing entries of.

k

the rank of the SVD approximation.

tol

the convergence tolerance for the EM algorithm.

maxiter

the maximum number of EM steps to take.

Value

x

the completed version of the matrix.

rss

the sum of squares between the SVD approximation and the non-missing values in x.

iter

the number of EM iterations before algorithm stopped.

Details

Impute the missing values of x as follows: First, initialize all NA values to the column means, or 0 if all entries in the column are missing. Then, until convergence, compute the first k terms of the SVD of the completed matrix. Replace the previously missing values with their approximations from the SVD, and compute the RSS between the non-missing values and the SVD.

Declare convergence if abs(rss0 - rss1) / (.Machine$double.eps + rss1) < tol , where rss0 and rss1 are the RSS values computed from successive iterations. Stop early after maxiter iterations and issue a warning.

References

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D. and Altman, R.B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520--525.

See Also

cv.svd.wold

Examples

Run this code
# NOT RUN {
  # Generate a matrix with missing entries    
  n <- 20
  p <- 10
  u <- rnorm( n )
  v <- rnorm( p )
  xfull <- u %*% rbind( v ) + rnorm( n*p )
  miss  <- sample( seq_len( n*p ), n )
  x       <- xfull
  x[miss] <- NA
      
  # impute the missing entries with a rank-1 SVD approximation
  xhat <- impute.svd( x, 1 )$x   
  
  # compute the prediction error for the missing entries
  sum( ( xfull-xhat )^2 )
# }

Run the code above in your browser using DataLab