Learn R Programming

bigstatsr (version 0.6.2)

big_randomSVD: Randomized partial SVD

Description

An algorithm for partial SVD (or PCA) of a Filebacked Big Matrix based on the algorithm in RSpectra (by Yixuan Qiu and Jiali Mei). This algorithm is linear in time in all dimensions and is very memory-efficient. Thus, it can be used on very large big.matrices.

Usage

big_randomSVD(X, fun.scaling = big_scale(center = FALSE, scale = FALSE),
  ind.row = rows_along(X), ind.col = cols_along(X), k = 10,
  tol = 1e-04, verbose = FALSE, ncores = 1)

Arguments

X

A FBM.

fun.scaling

A function that returns a named list of mean and sd for every column, to scale each of their elements such as followed: $$\frac{X_{i,j} - mean_j}{sd_j}.$$ Default doesn't use any scaling.

ind.row

An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.

ind.col

An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.

k

Number of singular vectors/values to compute. Default is 10. This algorithm should be used to compute only a few singular vectors/values.

tol

Precision parameter of svds. Default is 1e-4.

verbose

Should some progress be printed? Default is FALSE.

ncores

Number of cores used. Default doesn't use parallelism. You may use nb_cores.

Value

A named list (an S3 class "big_SVD") of

  • d, the singular values,

  • u, the left singular vectors,

  • v, the right singular vectors,

  • niter, the number of the iteration of the algorithm,

  • nops, number of Matrix-Vector multiplications used,

  • center, the centering vector,

  • scale, the scaling vector.

Note that to obtain the Principal Components, you must use predict on the result. See examples.

See Also

svds

Examples

Run this code
# NOT RUN {
set.seed(1)

X <- big_attachExtdata()
K <- 10

# Using only half of the data for "training"
n <- nrow(X)
ind <- sort(sample(n, n/2))
test <- big_randomSVD(X, fun.scaling = big_scale(), ind.row = ind, k = K)
str(test)

pca <- prcomp(X[ind, ], center = TRUE, scale. = TRUE)

# same scaling
all.equal(test$center, pca$center)
all.equal(test$scale,  pca$scale)

# use this function to predict scores
class(test)
scores <- predict(test)
# scores and loadings are the same or opposite
plot(scores, pca$x[, 1:K])
plot(test$v, pca$rotation[, 1:K])
plot(test$u)
plot(test, type = "scores")

# projecting on new data
ind2 <- setdiff(rows_along(X), ind)
scores.test2 <- predict(test, X, ind.row = ind2)
scores.test3 <- predict(pca, X[-ind, ])
plot(scores.test2, scores.test3[, 1:K])

# }

Run the code above in your browser using DataLab