Learn R Programming

rsvd (version 1.0.5)

rpca: Randomized principal component analysis (rpca).

Description

Fast computation of the principal components analysis using the randomized singular value decomposition.

Usage

rpca(
  A,
  k = NULL,
  center = TRUE,
  scale = TRUE,
  retx = TRUE,
  p = 10,
  q = 2,
  rand = TRUE
)

Arguments

A

array_like; a numeric \((m, n)\) input matrix (or data frame) to be analyzed. If the data contain \(NA\)s na.omit is applied.

k

integer; number of dominant principle components to be computed. It is required that \(k\) is smaller or equal to \(min(m,n)\), but it is recommended that \(k << min(m,n)\).

center

bool, optional; logical value which indicates whether the variables should be shifted to be zero centered (\(TRUE\) by default).

scale

bool, optional; logical value which indicates whether the variables should be scaled to have unit variance (\(TRUE\) by default).

retx

bool, optional; logical value indicating whether the rotated variables / scores should be returned (\(TRUE\) by default).

p

integer, optional; oversampling parameter for \(rsvd\) (default \(p=10\)), see rsvd.

q

integer, optional; number of additional power iterations for \(rsvd\) (default \(q=1\)), see rsvd.

rand

bool, optional; if (\(TRUE\)), the \(rsvd\) routine is used, otherwise \(svd\) is used.

Value

rpca returns a list with class \(rpca\) containing the following components:

rotation

array_like; the rotation (eigenvectors); \((n, k)\) dimensional array.

eigvals

array_like; eigenvalues; \(k\) dimensional vector.

sdev

array_like; standard deviations of the principal components; \(k\) dimensional vector.

x

array_like; the scores / rotated data; \((m, k)\) dimensional array.

center, scale

array_like; the centering and scaling used.

Details

Principal component analysis is an important linear dimension reduction technique.

Randomized PCA is computed via the randomized SVD algorithm (rsvd). The computational gain is substantial, if the desired number of principal components is relatively small, i.e. \(k << min(m,n)\).

The print and summary method can be used to present the results in a nice format. A scree plot can be produced with ggscreeplot. The individuals factor map can be produced with ggindplot, and a correlation plot with ggcorplot.

The predict function can be used to compute the scores of new observations. The data will automatically be centered (and scaled if requested). This is not fully supported for complex input matrices.

References

  • [1] N. B. Erichson, S. Voronin, S. L. Brunton and J. N. Kutz. 2019. Randomized Matrix Decompositions Using R. Journal of Statistical Software, 89(11), 1-48. 10.18637/jss.v089.i11.

  • [2] N. Halko, P. Martinsson, and J. Tropp. "Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions" (2009). (available at arXiv https://arxiv.org/abs/0909.4061).

See Also

ggscreeplot, ggindplot, ggcorplot, plot.rpca, predict, rsvd

Examples

Run this code
# NOT RUN {
library('rsvd')
#
# Load Edgar Anderson's Iris Data
#
data('iris')

#
# log transform
#
log.iris <- log( iris[ , 1:4] )
iris.species <- iris[ , 5]

#
# Perform rPCA and compute only the first two PCs
#
iris.rpca <- rpca(log.iris, k=2)
summary(iris.rpca) # Summary
print(iris.rpca) # Prints the rotations

#
# Use rPCA to compute all PCs, similar to \code{\link{prcomp}}
#
iris.rpca <- rpca(log.iris)
summary(iris.rpca) # Summary
print(iris.rpca) # Prints the rotations
plot(iris.rpca) # Produce screeplot, variable and individuls factor maps.

# }

Run the code above in your browser using DataLab