PCAaug: Augmentation Estimate for PCA

Description

For p-variate data, the augmentation estimate for PCA assumes that the last p-k eigenvalues are equal. Combining information from the eigenvalues and eigenvectors of the covariance matrix the augmentation estimator yields an estimate for k.

Usage

PCAaug(X, noise = "median", naug = 1, nrep = 1, sigma2 = NULL, alpha = NULL)

Arguments

numeric data matrix.

noise

name of the method to be used to estimate the noise variance. Options are "median", "last", "quantile" or "known". See details.

naug

number of components to be augmented.

nrep

number of repetitions for the augmentation procedure.

sigma2

value of the noise variance when noise = "known".

alpha

the quantile to be used when noise = "quantile".

Value

A list of class ladle containing:

method

the string PCA.

the estimated value of k.

vector giving the measures of variation of the eigenvectors using the bootstrapped eigenvectors for the different number of components.

phin

normalized eigenvalues of the covariance matrix.

the main criterion for the augmented estimate - the sum of fn and phin. k is the value where gn takes its minimum

lambda

the eigenvalues of the covariance matrix.

the transformation matrix to the principal components.

data matrix with the centered principal components.

the location of the data which was substracted before calculating the principal components.

data.name

the name of the data for which the augmented estimate was computed.

sigma2

the value used as noise variance when simulating the augmented components.

Details

The model here assumes that the eigenvalues of the covariance matrix are of the form \(\lambda_1 \geq ... \geq \lambda_{k} > \lambda_{k+1} = ... = \lambda_p\) and the goal is to estimate the value of k. The value \(\lambda_{k+1}\) corresponds then to the noise variance.

The augmented estimator adds for that purpose naug Gaussian components with the provided noise variance which needs to be provided (noise = "known") or estimated from the data. Three estimation methods are available. In the case of noise = "median" the estimate is the median of the eigenvalues of the covariance matrix, in the case of noise = "last" it corresponds to the last eigenvalue of the covariance matrix and in the case of noise = "quantile" it is the mean of the eigenvalues smaller or equal to the alpha-quantile of the eigenvalues of the covariance matrix.

The augmentation estimator uses then the augmented components to measure the variation of the eigenvalues. For a more stable result it is recommened to repeat the augmentation process several times and Lue and Li (2021) recommend to use for naug approximately p/5 or p/10 where p is the number of columns of X.

The augmented estimator for this purpose combines then the values of the scaled eigenvalues and the variation measured via augmentation. The main idea there is that for distinct eigenvales the variation of the eigenvectors is small and for equal eigenvalues the corresponding eigenvectors have large variation.

The augmented estimate for k is the value where the measure takes its minimum and can be also visualized as a ladle.

For further details see Luo and Li (2021) and Radojicic et al. (2021).

References

Luo, W. and Li, B. (2021), On Order Determination by Predictor Augmentation, Biometrika, 108, 557--574. <doi:10.1093/biomet/asaa077>

Radojicic, U., Lietzen, N., Nordhausen, K. and Virta, J. (2021), Dimension Estimation in Two-Dimensional PCA. In S. Loncaric, T. Petkovic and D. Petrinovic (editors) "Proceedings of the 12 International Symposium on Image and Signal Processing and Analysis (ISPA 2021)", 16--22. <doi:10.1109/ISPA52656.2021.9552114>

Examples

Run this code

# NOT RUN {
n <- 1000
Y <- cbind(rnorm(n, sd=2), rnorm(n,sd=2), rnorm(n), rnorm(n), rnorm(n), rnorm(n))

testPCA <- PCAaug(Y) 
testPCA
summary(testPCA)
plot(testPCA)
ladleplot(testPCA)
ladleplot(testPCA, crit = "fn")
ladleplot(testPCA, crit = "lambda")
ladleplot(testPCA, crit = "phin")
# }

Run the code above in your browser using DataLab