Learn R Programming

ICtest (version 0.3-5)

PCAboot: Bootstrap-Based Testing for Subsphericity

Description

The function tests, assuming an elliptical model, that the last p-k eigenvalues of a scatter matrix are equal and the k interesting components are those with a larger variance. To obtain p-values two different bootstrapping strategies are available and the user can provide the scatter matrix to be used as a function.

Usage

PCAboot(X, k, n.boot = 200, s.boot = "B1", S = MeanCov, Sargs = NULL)

Arguments

X

a numeric data matrix with p>1 columns.

k

the number of eigenvalues larger than the equal ones. Can be between 0 and p-2.

n.boot

number of bootstrapping samples.

s.boot

bootstrapping strategy to be used. Possible values are "B1", "B2". See details for further information.

S

A function which returns a list that has as its first element a location vector and as the second element the scatter matrix.

Sargs

list of further arguments passed on to the function specified in S.

Value

A list of class ictest inheriting from class htest containing:

statistic

the value of the test statistic.

p.value

the p-value of the test.

parameter

the degrees of freedom of the test.

method

character string which test was performed.

data.name

character string giving the name of the data.

alternative

character string specifying the alternative hypothesis.

k

the number or larger eigenvalues used in the testing problem.

W

the transformation matrix to the principal components.

S

data matrix with the centered principal components.

D

the underlying eigenvalues.

MU

the location of the data which was substracted before calculating the principal components.

SCATTER

The computed scatter matrix.

scatter

character string denoting which scatter function was used.

s.boot

character string denoting which bootstrapping test version was used.

Details

Here the function S needs to return a list where the first argument is a location vector and the second one a scatter matrix.

The location is used to center the data and the scatter matrix is used to perform PCA.

Consider X as the centered data and denote by W the transformation matrix to the principal components. The corresponding eigenvalues from PCA are \(d_1,...,d_p\). Under the null, \(d_k > d_{k+1} = ... = d_{p}\). Denote further by \(\bar{d}\) the mean of the last p-k eigenvalues and by \(D^* = diag(d_1,...,d_k,\bar{d},...,\bar{d})\) a \(p \times p\) diagonal matrix. Assume that \(S\) is the matrix with principal components which can be decomposed into \(S_1\) and \(S_2\) where \(S_1\) contains the k interesting principal components and \(S_2\) the last \(p-k\) principal components.

For a sample of size \(n\), the test statistic used for the boostrapping tests is $$T = n / (\bar{d}^2) \sum_{k+1}^p (d_i - \bar{d})^2.$$

The function offers then two boostrapping strategies:

  1. s.boot="B1": The first strategy has the following steps:

    1. Take a bootstrap sample \(S^*\) of size \(n\) from \(S\) and decompose it into \(S_1^*\) and \(S_2^*\).

    2. Every observation in \(S_2^*\) is transformed with a different random orthogonal matrix.

    3. Recombine \(S^*=(S_1^*, S_2^*)\) and create \(X^*= S^* W\).

    4. Compute the test statistic based on \(X^*\).

    5. Repeat the previous steps n.boot times.

  2. s.boot="B2": The second strategy has the following steps:

    1. Scale each principal component using the matrix \(D\), i.e. \(Z = S D\).

    2. Take a bootstrap sample \(Z^*\) of size \(n\) from \(Z\).

    3. Every observation in \(Z^*\) is transformed with a different random orthogonal matrix.

    4. Recreate \(X^*= Z^* {D^*}^{-1} W\).

    5. Compute the test statistic based on \(X^*\).

    6. Repeat the previous steps n.boot times.

    To create the random orthogonal matrices the function rorth is used.

References

Nordhausen, K., Oja, H. and Tyler, D.E. (2022), Asymptotic and Bootstrap Tests for Subspace Dimension, Journal of Multivariate Analysis, 188, 104830. <doi:10.1016/j.jmva.2021.104830>.

See Also

cov, MeanCov, PCAasymp

Examples

Run this code
# NOT RUN {
n <- 200
X <- cbind(rnorm(n, sd = 2), rnorm(n, sd = 1.5), rnorm(n), rnorm(n), rnorm(n))

# for demonstration purpose the n.boot is chosen small, should be larger in real applications

TestCov <- PCAboot(X, k = 2, n.boot=30)
TestCov


TestTM <- PCAboot(X, k = 1, n.boot=30, s.boot = "B2", S = "tM", Sargs = list(df=2))
TestTM

# }

Run the code above in your browser using DataLab