The function tests, assuming an elliptical model, that the last p-k
eigenvalues of
a scatter matrix are equal and the k
interesting components are those with a larger variance.
To obtain p-values two different bootstrapping strategies are available and the user can provide the scatter matrix to be used
as a function.
PCAboot(X, k, n.boot = 200, s.boot = "B1", S = MeanCov, Sargs = NULL)
a numeric data matrix with p>1 columns.
the number of eigenvalues larger than the equal ones. Can be between 0 and p-2.
number of bootstrapping samples.
bootstrapping strategy to be used. Possible values are "B1"
, "B2"
. See details for further information.
A function which returns a list that has as its first element a location vector and as the second element the scatter matrix.
list of further arguments passed on to the function specified in S
.
A list of class ictest inheriting from class htest containing:
the value of the test statistic.
the p-value of the test.
the degrees of freedom of the test.
character string which test was performed.
character string giving the name of the data.
character string specifying the alternative hypothesis.
the number or larger eigenvalues used in the testing problem.
the transformation matrix to the principal components.
data matrix with the centered principal components.
the underlying eigenvalues.
the location of the data which was substracted before calculating the principal components.
The computed scatter matrix.
character string denoting which scatter function was used.
character string denoting which bootstrapping test version was used.
Here the function S
needs to return a list where the first argument is a location vector and the second one a scatter matrix.
The location is used to center the data and the scatter matrix is used to perform PCA.
Consider X as the centered data and denote by W the transformation matrix to the principal components. The corresponding eigenvalues
from PCA are \(d_1,...,d_p\). Under the null, \(d_k > d_{k+1} = ... = d_{p}\).
Denote further by \(\bar{d}\) the mean of the last p-k
eigenvalues and by \(D^* = diag(d_1,...,d_k,\bar{d},...,\bar{d})\) a \(p \times p\) diagonal matrix. Assume that \(S\) is the matrix with principal components which can be decomposed into \(S_1\) and \(S_2\) where
\(S_1\) contains the k interesting principal components and \(S_2\) the last \(p-k\) principal components.
For a sample of size \(n\), the test statistic used for the boostrapping tests is $$T = n / (\bar{d}^2) \sum_{k+1}^p (d_i - \bar{d})^2.$$
The function offers then two boostrapping strategies:
s.boot="B1"
:
The first strategy has the following steps:
Take a bootstrap sample \(S^*\) of size \(n\) from \(S\) and decompose it into \(S_1^*\) and \(S_2^*\).
Every observation in \(S_2^*\) is transformed with a different random orthogonal matrix.
Recombine \(S^*=(S_1^*, S_2^*)\) and create \(X^*= S^* W\).
Compute the test statistic based on \(X^*\).
Repeat the previous steps n.boot
times.
s.boot="B2"
:
The second strategy has the following steps:
Scale each principal component using the matrix \(D\), i.e. \(Z = S D\).
Take a bootstrap sample \(Z^*\) of size \(n\) from \(Z\).
Every observation in \(Z^*\) is transformed with a different random orthogonal matrix.
Recreate \(X^*= Z^* {D^*}^{-1} W\).
Compute the test statistic based on \(X^*\).
Repeat the previous steps n.boot
times.
To create the random orthogonal matrices the function rorth
is used.
Nordhausen, K., Oja, H. and Tyler, D.E. (2022), Asymptotic and Bootstrap Tests for Subspace Dimension, Journal of Multivariate Analysis, 188, 104830. <doi:10.1016/j.jmva.2021.104830>.
# NOT RUN {
n <- 200
X <- cbind(rnorm(n, sd = 2), rnorm(n, sd = 1.5), rnorm(n), rnorm(n), rnorm(n))
# for demonstration purpose the n.boot is chosen small, should be larger in real applications
TestCov <- PCAboot(X, k = 2, n.boot=30)
TestCov
TestTM <- PCAboot(X, k = 1, n.boot=30, s.boot = "B2", S = "tM", Sargs = list(df=2))
TestTM
# }
Run the code above in your browser using DataLab