corr.null
estimates the correlation matrix of the vector influence curve for such parameters and returns samples from the corresponding normal distribution. Arguments to the function allow for refinements in calculating the resulting null distribution estimate.corr.null(X, W = NULL, Y = NULL, Z = NULL, test = "t.twosamp.unequalvar", alternative = "two-sided", use = "pairwise", B = 1000, MVN.method = "mvrnorm", penalty = 1e-06, ic.quant.trans = FALSE, marg.null = NULL, marg.par = NULL, perm.mat = NULL)
exprs(X)
is the data of interest and pData(X)
may contain outcomes and covariates of interest. For most currently implemented tests (exception: tests involving correlation parameters), one hypothesis is tested for each row of the data.X
.Surv
object containing the outcome of interest.nrow(Z)=ncol(X)
. By the time the function is called, this argument contains a 'design matrix' with the variable to be tested in the first column, additional covariates in the remaining columns, and no intercept column.cor
, a character string giving a method for computing covariances in the presence of missing values. Default is 'pairwise', which allows for the covariance/correlation matrix to be calculated using the most information possible when NA
s are present.MASS
library, whereas 'Cholesky' relies on a Cholesky decomposition. Default is 'mvrnorm'.MVN.method='Cholesky'
, the value in penalty
is added to all diagonal elements of the estimated test statistics correlation matrix to ensure that the matrix is positive definite and that internal calls to 'chol'
do not return an error. Default is 1e-6.perm.mat
) should be applied to the multivariate normal null distribution. Defaults for marg.null
and marg.par
exist, but can also be specified by the user (see below). Default is 'FALSE'.ic.quant.trans=TRUE
, a character string naming the marginal null distribution to use for quantile transformation. Can be one of, 't' or 'perm'. Default is 'NULL', in which case the marginal null distribution is selected based on choice of test statistics. Defaults explained below. If 'perm', the user must supply a vector or matrix of test statistics corresponding to another marginal null distribution, perhaps one created externally by the user, and possibly referring to empirically derived marginal permutation distributions, although the statistics could represent any suitable choice of marginal null distribution.ic.quant.trans=TRUE
, the parameters defining the marginal null distribution in marg.null
to be used for quantile transformation. Default is 'NULL', in which case the values are selected based on choice of test statistics and other available parameters (e.g., sample size, number of groups, etc.). Defaults explained below. User can override defaults, in which case a matrix of marginal null distribution parameters must be provided. Providing a matrix allows the user to perform multiple testing using parameters which may vary with each hypothesis, as may be desired in common-quantile minP proceduresic.quant.trans=TRUE
, a matrix of user-supplied test statistics from a particular distribution to be used during marginal quantile transformation. Supplying a vector of test statistics will apply the same vector to each hypothesis. The statistics may represent empirically derived marginal permutation values, may be theoretical values, or may represent a sample from some other suitable choice of marginal null distribution.nrow(X)
) by the number of desired samples (B
).
nulldist='ic'
is evaluated in the main user-level functions MTP
or EBMTP
. Formatting of the data objects X
, W
, Y
, and especially Z
occurs at execution begin of the main user-level functions.Based on the value of test
, the appropriate correlation matrix of the vector influence curve is calculated. Once the correlation matrix is obtained, one may sample vectors of null test statistics directly from a multivariate normal distribution rather than relying on permutation-based or bootstrap-based resampling. Because the Gaussian distribution is continuous, we expect this choice of null distribution to suffer less from discreteness than either the permutation or the bootstrap distribution. Additionally, in large-scale settings, use of null distributions derived from the vector influence function typically reduce computational bottlenecks associated with resampling methods.
Because the influence curve null distributions have been implemented for parametric, standardized t-statistics, the options robust
and standardize
are not allowed. Influence curve null distributions are available for the following values of test
: 't.onesamp', 't.pair', 't.twosamp.equalvar', 't.twosamp.unequalvar', 'lm.XvsZ', 'lm.YvsXZ', 't.cor', and 'z.cor'.
In the simpler cases involving one-sample and two-sample tests of means, the correlation matrices are obtained via calls to cor
. For two-sample tests, the correlation matrix corresponds to the following transformation of the group-specific covariance matrices: cov(X(group1))/n1 + cov(X(group2))/n2, where n1 and n2 are sample sizes of each group. When weights are present, the internal function IC.CorXW.NA
is called to calculate weighted estimates of the (group) covariance matrices from each subject's estimated vector influence curve. The calculations are similar in spirit to those in cov.wt
, but they are done in a way which allows for handling NA
elements in the estimated vector influence curve IC_n. The correlation matrix corresponding to IC_n * (IC_n)^t is calculated.
For linear regression models, corr.null
calculates the vector influence curve associated
with each subject/sample. The vector has length equal to the number of hypotheses. The internal function IC.Cor.NA
is used to calculate IC_n * (IC_n)^t in a manner which allows for NA-handling when the influence curve may contain missing elements. For linear regression models of the form E[Y|X], IC_n takes the form (E[((X^t)X)^(-1)] (X^t)_i Y_i) - Y_i-hat. Influence curves for correlation parameters are more complicated, and the user is referred to the references below.
Once the correlation matrix sigma' corresponding to the variance covariance matrix of the vector influence curve sigma =IC_n * (IC_n)^t is obtained, one may sample from N(0,sigma') to obtain null test statistics.
If ic.quant.trans=TRUE
, the matrix of null test statistics can be quantile transformed to produce a matrix which accounts for the joint dependencies between test statistics (down columns), but which has marginal t-distributions (across rows). If marg.null
and marg.par
are not specified (=NULL), the following default t-distributions are applied:
S. Dudoit and M.J. van der Laan. Multiple Testing Procedures and Applications to Genomics. Springer Series in Statistics. Springer, New York, 2008.
H.N. Gilbert, M.J. van der Laan, and S. Dudoit, "Joint Multiple Testing Procedures for Inferring Genetic Networks from Lower-Order Conditional Independence Graphs" (2009). In preparation.
boot.null
,MTP
, MTP-class
, EBMTP
, EBMTP-class
, get.Tn
, ss.maxT
, mt.sample.teststat
,get.Tn
, wapply
, boot.resample
set.seed(99)
data <- matrix(rnorm(10*50),nr=10,nc=50)
nulldistn.mvrnorm <- corr.null(data,t="t.onesamp",alternative="greater",B=5000)
nulldistn.chol <- corr.null(data,t="t.onesamp",MVN.method="Cholesky",penalty=1e-9)
nulldistn.t <- corr.null(data,t="t.onesamp",ic.quant.trans=TRUE)
dim(nulldistn.mvrnorm)
Run the code above in your browser using DataLab