pcatune: Tune the number of principal components in PCA

Description

pcatune can be used to quickly visualise the proportion of explained variance for a large number of principal components in PCA.

Usage

pcatune(X, ncomp = NULL, center = TRUE, scale. = FALSE,
        max.iter = 500, tol = 1e-09)

Arguments

a numeric matrix (or data frame) which provides the data for the principal components analysis. It can contain missing values.

ncomp

integer, the number of components to initially analyse in pcatune to choose a final ncomp for pca. If NULL, function sets ncomp = min(nrow(X), ncol(X))

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of X can be supplied. The value is passed to s

scale.

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with prcomp function, but in general scaling is advisable.

max.iter

integer, the maximum number of iterations for the NIPALS algorithm.

tol

a positive real, the tolerance used for the NIPALS algorithm.

Value

pcatune returns a list with class "pcatune" containing the following components:
varthe eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).
prop.varthe proportion of explained variance accounted for by each principal component is calculated using the eigenvalues
cum.varthe cumulative proportion of explained variance accounted for by the sequential accumulation of principal components is calculated using the sum of the proportion of explained variance

encoding

latin1

Details

The calculation is done either by a singular value decomposition of the (possibly centered and scaled) data matrix, if the data is complete or by using the NIPALS algorithm if there is data missing. Unlike princomp, the print method for these objects prints the results in a nice format and the plot method produces a bar plot of the percentage of variance explained by the principal components (PCs). When using NIPALS (missing values), we make the assumption that the first (min(ncol(X), nrow(X)) principal components will account for 100 % of the explained variance. Note that scale.= TRUE cannot be used if there are zero or constant (for center = TRUE) variables.

Examples

Run this code

data(liver.toxicity)
tune <- pcatune(liver.toxicity$gene, center = TRUE, scale. = TRUE)

Run the code above in your browser using DataLab