Learn R Programming

Rdimtools (version 1.0.6)

do.rpca: Robust Principal Component Analysis

Description

Robust PCA (RPCA) is not like other methods in this package as finding explicit low-dimensional embedding with reduced number of columns. Rather, it is more of a decomposition method of data matrix \(X\), possibly noisy, into low-rank and sparse matrices by solving the following, $$\textrm{minimize}\quad \|L\|_* + \lambda \|S\|_1 \quad{s.t.} L+S=X$$ where \(L\) is a low-rank matrix, \(S\) is a sparse matrix and \(\|\cdot\|_*\) denotes nuclear norm, i.e., sum of singular values. Therefore, it should be considered as preprocessing procedure of denoising. Note that after RPCA is applied, \(L\) should be used as kind of a new data matrix for any manifold learning scheme to be applied.

Usage

do.rpca(
  X,
  mu = 1,
  lambda = sqrt(1/(max(dim(X)))),
  preprocess = c("null", "center", "scale", "cscale", "decorrelate", "whiten")
)

Arguments

X

an \((n\times p)\) matrix or whose rows are observations and columns represent independent variables.

mu

an augmented Lagrangian parameter

lambda

parameter for the sparsity term \(\|S\|_1\). Default value is given accordingly to the referred paper.

preprocess

an additional option for preprocessing the data. Default is "null". See also aux.preprocess for more details.

Value

a named list containing

L

an \((n\times p)\) low-rank matrix.

S

an \((n\times p)\) sparse matrix.

trfinfo

a list containing information for out-of-sample prediction.

References

candes_robust_2011Rdimtools

Examples

Run this code
# NOT RUN {
## Load Iris data and put some noise
data(iris)
set.seed(100)
subid = sample(1:150,50)
noise = 0.2
X = as.matrix(iris[subid,1:4])
X = X + matrix(noise*rnorm(length(X)), nrow=nrow(X))
lab = as.factor(iris[subid,5])

## try different regularization parameters
rpca1 = do.rpca(X, lambda=0.1)
rpca2 = do.rpca(X, lambda=1)
rpca3 = do.rpca(X, lambda=10)

## apply identical PCA methods
Y1 = do.pca(rpca1$L, ndim=2)$Y
Y2 = do.pca(rpca2$L, ndim=2)$Y
Y3 = do.pca(rpca3$L, ndim=2)$Y

## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(1,3))
plot(Y1, pch=19, col=lab, main="RPCA+PCA::lambda=0.1")
plot(Y2, pch=19, col=lab, main="RPCA+PCA::lambda=1")
plot(Y3, pch=19, col=lab, main="RPCA+PCA::lambda=10")
par(opar)

# }

Run the code above in your browser using DataLab