spca: Sparse Principal Components Analysis

Description

Performs a sparse principal components analysis to perform variable selection by using singular value decomposition.

Usage

spca(X, ncomp = 3, center = TRUE, scale. = TRUE, 
     keepX = rep(ncol(X),ncomp), iter.max = 500,
     tol = 1e-06)

Arguments

a numeric matrix (or data frame) which provides the data for the sparse principal components analysis.

ncomp

integer, the number of components to keep.

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternatively, a vector of length equal the number of columns of X can be supplied. The value is passed to

scale.

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is TRUE. See details.

iter.max

integer, the maximum number of iterations to check convergence in each component.

tol

a positive real, the tolerance used in the iterative algorithm.

keepX

numeric vector of length ncomp, the number of variables to keep in loading vectors. By default all variables are kept in the model. See details.

Value

spca returns a list with class "spca" containing the following components:
ncompthe number of components to keep in the calculation.
varXthe adjusted cumulative percentage of variances explained.
keepXthe number of variables kept in each loading vector.
iterthe number of iterations needed to reach convergence for each component.
rotationthe matrix containing the sparse loading vectors.
xthe matrix containing the principal components.

encoding

latin1

Details

The calculation employs singular value decomposition of the (centered and scaled) data matrix and LASSO to generate sparsity on the loading vectors. scale.= TRUE is highly recommended as it will help obtaining orthogonal sparse loading vectors. keepX is the number of variables to keep in loading vectors. The difference between number of columns of X and keepX is the degree of sparsity, which refers to the number of zeros in each loading vector. Note that spca does not apply to the data matrix with missing values. The biplot function for spca is not available.

References

Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. Journal of Multivariate Analysis 99, 1015-1034.

Examples

Run this code

data(liver.toxicity)
spca.rat <- spca(liver.toxicity$gene, ncomp = 3, keepX = rep(50, 3))
spca.rat

## variable representation
plotVar(spca.rat, X.label = TRUE, cex = 0.5)
plot3dVar(spca.rat)

## samples representation
plotIndiv(spca.rat, ind.names = liver.toxicity$treatment[, 3], cex = 0.5, 
          col = as.numeric(liver.toxicity$treatment[, 3]))
plot3dIndiv(spca.rat, cex = 0.01, 
            col = as.numeric(liver.toxicity$treatment[, 3]))

Run the code above in your browser using DataLab