ladle: Ladle estimate for an arbitrary matrix

Description

The ladle estimates the rank of a symmetric matrix \(S\) by combining the classical screeplot with an estimate of the rank from the bootstrap eigenvector variability of \(S\).

Usage

ladle(x, S, n.boots = 200, ...)

Arguments

n x p data matrix.

Function for computing a q x q symmetric matrix from the data x.

n.boots

The number of bootstrap samples.

...

Furhter parameters passed to S

Value

A list of class ladle containing:

method

The string ``general''.

The estimated value of k.

A vector giving the measures of variation of the eigenvectors using the bootstrapped eigenvectors for the different number of components.

phin

The normalized eigenvalues of the S matrix.

The main criterion for the ladle estimate - the sum of fn and phin. k is the value where gn takes its minimum.

lambda

The eigenvalues of the covariance matrix.

data.name

The name of the data for which the ladle estimate was computed.

Details

Assume that the eigenvalues of the population version of S are \(\lambda_1 >= ... >= \lambda_k > \lambda_k+1 = ... = \lambda_p\). The ladle estimates the true value of \(k\) (for example the rank of S) by combining the classical screeplot with estimate of \(k\) from the bootstrap eigenvector variability of S.

For applying the ladle to either PCA, FOBI or SIR, see the dedicated functions PCAladle, FOBIladle, SIRladle.

References

Luo, W. and Li, B. (2016), Combining Eigenvalues and Variation of Eigenvectors for Order Determination, Biometrika, 103. 875-887. <doi:10.1093/biomet/asw051>

Examples

Run this code

# NOT RUN {
# Function for computing the left CCA matrix
S_CCA <- function(x, dim){
  x1 <- x[, 1:dim]
  x2 <- x[, -(1:dim)]
  stand <- function(x){
    x <- as.matrix(x)
    x <- sweep(x, 2, colMeans(x), "-")
    eigcov <- eigen(cov(x), symmetric = TRUE)
    x%*%(eigcov$vectors%*%diag((eigcov$values)^(-1/2))%*%t(eigcov$vectors))
  }
  
  x1stand <- stand(x1)
  x2stand <- stand(x2)
  
  crosscov <- cov(x1stand, x2stand)
  
  tcrossprod(crosscov)
}

# Toy data with two canonical components
n <- 200
x1 <- matrix(rnorm(n*5), n, 5)
x2 <- cbind(x1[, 1] + rnorm(n, sd = sqrt(0.5)),
            -1*x1[, 1] + x1[, 2] + rnorm(n, sd = sqrt(0.5)),
            matrix(rnorm(n*3), n, 3))
x <- cbind(x1, x2)

# The ladle estimate
ladle_1 <- ladle(x, S_CCA, dim = 5)
ladleplot(ladle_1)
# }

Run the code above in your browser using DataLab