Learn R Programming

jackstraw (version 1.3.17)

jackstraw_pca: Non-Parametric Jackstraw for Principal Component Analysis (PCA)

Description

Test association between the observed variables and their latent variables captured by principal components (PCs).

Usage

jackstraw_pca(
  dat,
  r = NULL,
  r1 = NULL,
  s = NULL,
  B = NULL,
  covariate = NULL,
  verbose = TRUE
)

Value

jackstraw_pca returns a list consisting of

p.value

m p-values of association tests between variables and their principal components

obs.stat

m observed F-test statistics

null.stat

s*B null F-test statistics

Arguments

dat

a data matrix with m rows as variables and n columns as observations.

r

a number (a positive integer) of significant principal components. See permutationPA and other methods.

r1

a numeric vector of the principal components that are of interest. Choose a subset of r significant PCs to be used.

s

a number (a positive integer) of ``synthetic'' null variables. Out of m variables, s variables are independently permuted.

B

a number (a positive integer) of resampling iterations. There will be a total of s*B null statistics.

covariate

a data matrix of covariates with corresponding n observations (do not include an intercept term).

verbose

a logical specifying to print the computational progress.

Author

Neo Christopher Chung nchchung@gmail.com

Details

This function computes m p-values of linear association between m variables and their PCs. Its resampling strategy accounts for the over-fitting characteristics due to direct computation of PCs from the observed data and protects against an anti-conservative bias.

Provide the data matrix, with m variables as rows and n observations as columns. Given that there are r significant PCs, this function tests for linear association between m variables and their r PCs.

You could specify a subset of significant PCs that you are interested in (r1). If r1 is given, then this function computes statistical significance of association between m variables and r1, while adjusting for other PCs (i.e., significant PCs that are not your interest). For example, if you want to identify variables associated with first and second PCs, when your data contains three significant PCs, set r=3 and r1=c(1,2).

Please take a careful look at your data and use appropriate graphical and statistical criteria to determine a number of significant PCs, r. The number of significant PCs depends on the data structure and the context. In a case when you fail to specify r, it will be estimated from a permutation test (Buja and Eyuboglu, 1992) using a function permutationPA.

If s is not supplied, s is set to about 10% of m variables. If B is not supplied, B is set to m*10/s.

References

Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 tools:::Rd_expr_doi("10.1093/bioinformatics/btu674")

See Also

jackstraw jackstraw_subspace permutationPA

Examples

Run this code
if (FALSE) {
## simulate data from a latent variable model: Y = BL + E
B = c(rep(1,50),rep(-1,50), rep(0,900))
L = rnorm(20)
E = matrix(rnorm(1000*20), nrow=1000)
dat = B %*% t(L) + E
dat = t(scale(t(dat), center=TRUE, scale=TRUE))

## apply the jackstraw
out = jackstraw_pca(dat, r=1)

## Use optional arguments
## For example, set s and B for a balance between speed of the algorithm and accuracy of p-values
## out = jackstraw_pca(dat, r=1, s=10, B=1000)
}

Run the code above in your browser using DataLab