Learn R Programming

gmodels (version 2.19.1)

fast.prcomp: Efficient computation of principal components and singular value decompositions.

Description

The standard stats::prcomp() and svd() function are very inefficient for wide matrixes. fast.prcomp and fast.svd are modified versions which are efficient even for matrixes that are very wide.

Usage

fast.prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL)

Value

See the documetation for stats::prcomp() or svd() .

Arguments

x

data matrix

retx

a logical value indicating whether the rotated variables should be returned.

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.

scale.

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.

tol

a value indicating the magnitude below which components should be omitted. (Components are omitted if their standard deviations are less than or equal to tol times the standard deviation of the first component.) With the default null setting, no components are omitted (unless rank. is specified less than min(dim(x)).). Other settings for tol could be tol = 0 or tol = sqrt(.Machine$double.eps), which would omit essentially constant components.

Author

Modifications by Gregory R. Warnes greg@warnes.net

Details

The current implementation of the function svd() in S-Plus and R is much slower when operating on a matrix with a large number of columns than on the transpose of this matrix, which has a large number of rows. As a consequence, stats::prcomp(), which uses svd(), is also very slow when applied to matrixes with a large number of rows.

The simple solution is to use La.svd() instead of svd(). A suitable patch to stats::prcomp() has been submitted. In the mean time, the function fast.prcomp has been provided as a short-term work-around.

list("fast.prcomp")

is a modified versiom of stats::prcomp() that calls La.svd() instead of svd()

list("fast.svd")

is simply a wrapper around La.svd().

See Also

Examples

Run this code


  # create test matrix
  set.seed(4943546)
  nr <- 50
  nc <- 2000
  x  <- matrix( rnorm( nr*nc), nrow=nr, ncol=nc )
  tx <- t(x)

  # SVD directly on matrix is SLOW:
  system.time( val.x <- svd(x)$u )

  # SVD on t(matrix) is FAST:
  system.time( val.tx <- svd(tx)$v )

  # and the results are equivalent:
  max( abs(val.x) - abs(val.tx) )

  # Time gap dissapears using fast.svd:
  system.time( val.x <- fast.svd(x)$u )
  system.time( val.tx <- fast.svd(tx)$v )
  max( abs(val.x) - abs(val.tx) )


  library(stats)

  # prcomp directly on matrix is SLOW:
  system.time( pr.x <- prcomp(x) )

  # prcomp.fast is much faster
  system.time( fast.pr.x <- fast.prcomp(x) )

  # and the results are equivalent
  max( pr.x$sdev - fast.pr.x$sdev )
  max( abs(pr.x$rotation[,1:49]) - abs(fast.pr.x$rotation[,1:49]) )
  max( abs(pr.x$x) - abs(fast.pr.x$x)  )

  # (except for the last and least significant component):
  max( abs(pr.x$rotation[,50]) - abs(fast.pr.x$rotation[,50]) )

Run the code above in your browser using DataLab