Learn R Programming

jackstraw (version 1.3.17)

jackstraw_lfa: Non-Parametric Jackstraw for Logistic Factor Analysis

Description

Test association between the observed variables and their latent variables captured by logistic factors (LFs).

Usage

jackstraw_lfa(
  dat,
  r,
  FUN,
  r1 = NULL,
  s = NULL,
  B = NULL,
  covariate = NULL,
  permute_alleles = TRUE,
  verbose = TRUE
)

Value

jackstraw_lfa returns a list consisting of

p.value

m p-values of association tests between variables and their LFs

obs.stat

m observed deviances

null.stat

s*B null deviances

Arguments

dat

either a genotype matrix with m rows as variables and n columns as observations, or a BEDMatrix object (see package BEDMatrix, these objects are transposed compared to the above but this works fine as-is, see example, no need to modify a BEDMatrix input). A BEDMatrix input triggers a low-memory mode where permuted data is also written and processed from disk, whereas a regular matrix input stores permutations in memory. The tradeoff is BEDMatrix version typically runs considerably slower, but enables analysis of very large data that is otherwise impossible.

r

a number of significant LFs.

FUN

a function to use for LFA.

r1

a numeric vector of LFs of interest (implying you are not interested in all r LFs).

s

a number of ``synthetic'' null variables. Out of m variables, s variables are independently permuted.

B

a number of resampling iterations. There will be a total of s*B null statistics.

covariate

a data matrix of covariates with corresponding n observations (do not include an intercept term).

permute_alleles

If TRUE (default), alleles (rather than genotypes) are permuted, which results in a more Binomial synthetic null when data is highly structured. Changing to FALSE is not recommended, except for research purposes to confirm that it performs worse than the default.

verbose

a logical specifying to print the computational progress.

Author

Neo Christopher Chung nchchung@gmail.com

Alejandro Ochoa alejandro.ochoa@duke.edu

Details

This function uses logistic factor analysis (LFA) from Hao et al. (2016). Particularly, the deviance in logistic regression (the full model with r LFs vs. the intercept-only model) is used to assess significance. This function requires the gcatest package, and in practice also the lfa package, to be installed from Bioconductor.

The random outputs of the regular matrix versus the BEDMatrix versions are equal in distribution. However, fixing a seed and providing the same data to both versions does not result in the same exact outputs. This is because the BEDMatrix version permutes loci in a different order by necessity.

References

Chung and Storey (2015) Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics, 31(4): 545-554 tools:::Rd_expr_doi("10.1093/bioinformatics/btu674")

See Also

jackstraw_pca jackstraw jackstraw_subspace

Examples

Run this code
if (FALSE) {
## simulate genotype data from a logistic factor model: drawing rbinom from logit(BL)
m <- 5000; n <- 100; pi0 <- .9
m0 <- round(m*pi0)
m1 <- m - round(m*pi0)
B <- matrix(0, nrow=m, ncol=1)
B[1:m1,] <- matrix(runif(m1*n, min=-.5, max=.5), nrow=m1, ncol=n)
L <- matrix(rnorm(n), nrow=1, ncol=n)
BL <- B %*% L
prob <- exp(BL)/(1+exp(BL))

dat <- matrix(rbinom(m*n, 2, as.numeric(prob)), m, n)

# load lfa package (install from Bioconductor)
library(lfa)
# choose the number of logistic factors, including the intercept
r <- 2
# define the function this way, a function of the genotype matrix only
FUN <- function(x) lfa::lfa( x, r )

## apply the jackstraw_lfa
out <- jackstraw_lfa( dat, r, FUN )

# if you had very large genotype data in plink BED/BIM/FAM files,
# use BEDMatrix and save memory by reading from disk (at the expense of speed)
library(BEDMatrix)
dat_BM <- BEDMatrix( 'filepath' ) # assumes filepath.bed, .bim and .fam exist
# run jackstraw!
out <- jackstraw_lfa( dat_BM, r, FUN )
}

Run the code above in your browser using DataLab