pca: Principal Components Analysis

Description

Performs a principal components analysis on the given data matrix that can contain missing values. If data are complete 'pca' uses Singular Value Decomposition, if there are some missing values, it uses the NIPALS algorithm.

Usage

pca(X, ncomp = 3, center = TRUE, scale. = FALSE, 
    comp.tol = NULL, max.iter = 500, tol = 1e-09)

Arguments

a numeric matrix (or data frame) which provides the data for the principal components analysis. It can contain missing values.

ncomp

integer, if data is complete ncomp decides the number of components and associated eigenvalues to display from the pcasvd algorithm and if the data has missing values, ncomp gives the number of components to

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of X can be supplied. The value is passed to s

scale.

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with prcomp function, but in general scaling is advisable.

comp.tol

a value indicating the magnitude below which components should be omitted.

max.iter

integer, the maximum number of iterations in the NIPALS algorithm.

tol

a positive real, the tolerance used in the NIPALS algorithm.

Value

pca returns a list with class "pca" and "prcomp" containing the following components:
ncompthe number of principal components used.
sdevthe eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix or by using NIPALS.
rotationthe matrix of variable loadings (i.e., a matrix whose columns contain the eigenvectors).
Xif retx is true the value of the rotated data (the centred (and scaled if requested) data multiplied by the rotation matrix) is returned.
center, scalethe centering and scaling used, or FALSE.

encoding

latin1

Details

The calculation is done either by a singular value decomposition of the (possibly centered and scaled) data matrix, if the data is complete or by using the NIPALS algorithm if there is data missing. Unlike princomp, the print method for these objects prints the results in a nice format and the plot method produces a bar plot of the percentage of variance explaned by the principal components (PCs). Note that scale.= TRUE cannot be used if there are zero or constant (for center = TRUE) variables. Components are omitted if their standard deviations are less than or equal to comp.tol times the standard deviation of the first component. With the default null setting, no components are omitted. Other settings for comp.tol could be comp.tol = sqrt(.Machine$double.eps), which would omit essentially constant components, or comp.tol = 0.

Examples

Run this code

data(multidrug)

## this data set contains missing values, therefore 
## the 'prcomp' function cannot be applied
pca.res <- pca(multidrug$ABC.trans, ncomp = 4, scale = TRUE)
plot(pca.res)
print(pca.res)
biplot(pca.res, xlabs = multidrug$cell.line$Class, cex = 0.7)

# samples representation
plotIndiv(pca.res, ind.names = multidrug$cell.line$Class, cex = 0.5, 
          col = as.numeric(as.factor(multidrug$cell.line$Class)))
plot3dIndiv(pca.res, cex = 0.2,
            col = as.numeric(as.factor(multidrug$cell.line$Class)))

# variables representation
plotVar(pca.res, var.label = TRUE)
plot3dVar(pca.res, rad.in = 0.5, var.label = TRUE, cex = 0.5)

Run the code above in your browser using DataLab