tune.pca
can be used to quickly visualise the proportion of explained variance
for a large number of principal components in PCA.
tune.pca(X, ncomp = NULL, center = TRUE, scale = FALSE,
max.iter = 500, tol = 1e-09, logratio = 'none',
V = NULL, multilevel = NULL)
a numeric matrix (or data frame) which provides the data for the principal components analysis. It can contain missing values.
integer, the number of components to initially analyse in tune.pca
to choose a final
ncomp
for pca
. If NULL
,
function sets ncomp = min(nrow(X), ncol(X))
a logical value indicating whether the variables should be shifted to be zero centered.
Alternately, a vector of length equal the number of columns of X
can be supplied.
The value is passed to scale
.
a logical value indicating whether the variables should be scaled to have
unit variance before the analysis takes place. The default is FALSE
for consistency with prcomp
function, but in general scaling is advisable. Alternatively, a vector of length equal the number of
columns of X
can be supplied. The value is passed to scale
.
integer, the maximum number of iterations for the NIPALS algorithm.
a positive real, the tolerance used for the NIPALS algorithm.
one of ('none','CLR','ILR'). Default to 'none'
Matrix used in the logratio transformation id provided.
Design matrix for multilevel analysis (for repeated measurements).
tune.pca
returns a list with class "tune.pca"
containing the following components:
the square root of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).
the proportion of explained variance accounted for by each principal component is calculated using the eigenvalues
the cumulative proportion of explained variance accounted for by the sequential accumulation of principal components is calculated using the sum of the proportion of explained variance
The calculation is done either by a singular value decomposition of the (possibly centered and scaled)
data matrix, if the data is complete or by using the NIPALS algorithm if there is data missing. Unlike
princomp
, the print method for these objects prints the results in a nice format and the
plot
method produces a bar plot of the percentage of variance explaned by the principal
components (PCs).
When using NIPALS (missing values), we make the assumption that the first (min(ncol(X),
nrow(X)
)
principal components will account for 100 % of the explained variance.
Note that scale= TRUE
cannot be used if there are zero or constant (for center = TRUE
) variables.
Components are omitted if their standard deviations are less than or equal to comp.tol
times
the standard deviation of the first component. With the default null setting, no components are omitted.
Other settings for comp.tol
could be comp.tol = sqrt(.Machine$double.eps)
,
which would omit essentially constant components, or comp.tol = 0
.
logratio transform and multilevel analysis are performed sequentially as internal pre-processing step, through logratio.transfo
and withinVariation
respectively.
nipals
, biplot
,
plotIndiv
, plotVar
and http://www.mixOmics.org for more details.
# NOT RUN {
data(liver.toxicity)
tune <- tune.pca(liver.toxicity$gene, center = TRUE, scale = TRUE)
tune
# }
Run the code above in your browser using DataLab