MTP(X, W = NULL, Y = NULL, Z = NULL, Z.incl = NULL, Z.test = NULL, na.rm = TRUE, test = "t.twosamp.unequalvar", robust = FALSE, standardize = TRUE, alternative = "two.sided", psi0 = 0, typeone = "fwer", k = 0, q = 0.1, fdr.method = "conservative", alpha = 0.05, smooth.null = FALSE, nulldist = "boot.cs", B = 1000, ic.quant.trans = FALSE, MVN.method = "mvrnorm", penalty = 1e-06, method = "ss.maxT", get.cr = FALSE, get.cutoff = FALSE, get.adjp = TRUE, keep.nulldist = TRUE, keep.rawdist = FALSE, seed = NULL, cluster = 1, type = NULL, dispatch = NULL, marg.null = NULL, marg.par = NULL, keep.margpar = TRUE, ncp = NULL, perm.mat = NULL, keep.index = FALSE, keep.label = FALSE)
exprs(X)
is the data of interest and pData(X)
may contain outcomes and covariates of interest. For most currently implemented tests (exception: tests involving correlation parameters), one hypothesis is tested for each row of the data.W
must be the same dimension as X
with one weight for each value in X
. If a vector, W
may contain one weight for each observation (i.e. column) of X
or one weight for each variable (i.e. row) of X
. In either case, the weights are duplicated appropriately. Weighted F-tests are not available. Default is 'NULL'.Surv
object containing the outcome of interest. This may be class labels (F-tests and two sample t-tests) or a continuous or polycotomous dependent variable (linear regression based t-tests), or survival data (Cox proportional hazards based t-tests). For block.f
and f.twoway
tests, class labels must be ordered by block and within each block ordered by group. If X
is an ExpressionSet, Y
can be a character string referring to the column of pData(X)
to use as outcome. Default is 'NULL'.nrow(Z)=ncol(X)
. If X
is an ExpressionSet, Z
can be a character string referring to the column of pData(X)
to use as covariates. The variables Z.incl
and Z.adj
allow one to specify which covariates to use in a particular test without modifying the input Z
. Default is 'NULL'.Z
(i.e. which variables) to include in the model. These can be numbers or column names (if the columns are names). Default is 'NULL'.Z
(i.e. which variable) to use to test for association with each row of X
in a linear model. Only used for test="lm.XvsZ"
, where it is necessary to specify which covariate's regression parameter is of interest. Default is 'NULL'.rlm
instead of lm
in linear models. Default is 'FALSE'.k
giving the allowed number of false positives, and tail probability of the proportion of false positives ('tppfp'), with parameter q
giving the allowed proportion of false positives. The false discovery rate ('fdr') can also be controlled.typeone="fdr"
. The options are "conservative" (default) for the more conservative, general FDR controlling procedure and "restricted" for the method which requires more assumptions.nulldist
is 'perm'), or the number of samples from the multivariate normal distribution (if nulldist
is 'ic') Can be reduced to increase the speed of computation, at a cost to precision. Default is 1000.nulldist='ic'
, a logical indicating whether or not a marginal quantile transformation using a t-distribution or user-supplied marginal distribution (stored in perm.mat
) should be applied to the multivariate normal null distribution. Defaults for marg.null
and marg.par
exist, but can also be specified by the user (see below). Default is 'FALSE'.nulldist='ic'
, one of 'mvrnorm' or 'Cholesky' designating how correlated normal test statistics are to be generated. Selecting 'mvrnorm' uses the function of the same name found in the MASS
library, whereas 'Cholesky' relies on a Cholesky decomposition. Default is 'mvrnorm'.nulldist='ic'
and MVN.method='Cholesky'
, the value in penalty
is added to all diagonal elements of the estimated test statistics correlation matrix to ensure that the matrix is positive definite and that internal calls to 'chol'
do not return an error. Default is 1e-6.nulldist
='perm'. Note that this matrix can be quite large.nulldist
='perm' or 'ic'. Note that this matrix can become quite large. If one wishes to use subsequent calls to update
or EBupdate
in which one updates choice of bootstrap null distribution, keep.rawdist
must be TRUE. To save on memory, update
only requires that one of keep.nulldist
or keep.rawdist
be 'TRUE'.set.seed
to set the seed for the random number generator for bootstrap resampling. This argument can be used to repeat exactly a test performed with a given seed. If the seed is specified via this argument, the same seed will be returned in the seed slot of the MTP object created. Else a random seed(s) will be generated, used and returned. Vector of integers used to specify seeds for each node in a cluster used to to generate a bootstrap null distribution.cluster=1
, bootstrap is implemented on single node. Supplying a cluster object results in the bootstrap being implemented in parallel on the provided nodes. This option is only available for the bootstrap procedure. With default value of 1, bootstrap is executed on single CPU.snow
package for details.B*dispatch
must be an integer. If dispatch is an integer, then B/dispatch
must be an integer. Default is 5 percent.nulldist='boot.qt'
, the marginal null distribution to use for quantile transformation. Can be one of 'normal', 't', 'f' or 'perm'. Default is 'NULL', in which case the marginal null distribution is selected based on choice of test statistics. Defaults explained below. If 'perm', the user must supply a vector or matrix of test statistics corresponding to another marginal null distribution, perhaps one created externally by the user, and possibly referring to empirically derived marginal permutation distributions, although the statistics could represent any suitable choice of marginal null distribution.nulldist='boot.qt'
, the parameters defining the marginal null distribution in marg.null
to be used for quantile transformation. Default is 'NULL', in which case the values are selected based on choice of test statistics and other available parameters (e.g., sample size, number of groups, etc.). Defaults explained below. User can override defaults, in which case a matrix of marginal null distribution parameters can be accepted. Providing numeric (vector) values will apply the same null distribution defined by the parameter to all hypotheses, while providing a matrix of values allows the user to perform multiple testing using parameters which may vary with each hypothesis, as may be desired in common-quantile minP procedures. In this way, theoretical factors or factors affecting sample size or missingness may be assessed.nulldist='boot.qt'
, a logical indicating whether the (internally created) matrix of marginal null distribution parameters should be returned. Default is 'TRUE'.nulldist='boot.qt'
, a value for a possible noncentrality parameter to be used during marginal quantile transformation. Default is 'NULL'.nulldist='boot.qt'
and marg.null='perm'
, a matrix of user-supplied test statistics from a particular distribution to be used during marginal quantile transformation. The statistics may represent empirically derived marginal permutation values, may be theoretical values, or may represent a sample from some other suitable choice of marginal null distribution.nulldist='ic'
and test='t.cor'
or test='z.cor'
, the index returned is a matrix with the indices of the first and second variables considered for pairwise correlations. If there are p hypotheses, this arguments returns t(combn(p,2))
. For all other choices of test statistic, the index is not returned, as they correspond to the original order of the hypotheses in X
.Y
should be returned as a slot in the resulting MTP object. Typically not necessary, although useful if one is using update
and wants to use marginal null distribution defaults with nulldist='boot.qt'
(e.g., with F-tests).MTP
, with the following slots:statistic
numeric
, observed test statistics for each hypothesis, specified by the values of the MTP
arguments test
, robust
, standardize
, and psi0
.estimate
sampsize
numeric
, number of columns (i.e. observations) in the input data set.rawp
numeric
, unadjusted, marginal p-values for each hypothesis.adjp
numeric
, adjusted (for multiple testing) p-values for each hypothesis (computed only if the get.adjp
argument is TRUE).conf.reg
alpha
(computed only if the get.cr
argument is TRUE).cutoff
alpha
(computed only if the get.cutoff
argument is TRUE).reject
'matrix'
, rejection indicators (TRUE for a rejected null hypothesis), for each value of the nominal Type I error rate alpha
.rawdist
keep.rawdist=TRUE
and if nulldist
is one of 'boot.ctr', 'boot.cs', or 'boot.qt'). This slot must not be empty if one wishes to call update
to change choice of bootstrap-based null distribution.nulldist
keep.nulldist=TRUE
); option not currently available for permutation null distribution, i.e., nulldist='perm'
). By default (i.e., for nulldist='boot.cs'
), the entries of nulldist
are the null value shifted and scaled bootstrap test statistics, with one null test statistic value for each hypothesis (rows) and bootstrap iteration (columns).nulldist.type
nulldist
argument in the call to MTP, i.e., 'boot.cs', 'boot.ctr', 'boot.qt', 'ic', or 'perm'.marg.null
nulldist='boot.qt'
, a character value returning which choice of marginal null distribution was used by the MTP. Can be used to check default values or to ensure manual settings were correctly applied.marg.par
nulldist='boot.qt'
, a numeric matrix returning the parameters of the marginal null distribution(s) used by the MTP. Can be used to check default values or to ensure manual settings were correctly applied.call
call
, the call to the MTP function.seed
MTP
. This argument is currently used only for the bootstrap null distribution (i.e., for nulldist="boot.xx"
). See ?set.seed
for details.Y
is coded as "0,0,0,1,1,1", "1,1,1,0,0,0" "0,1,0,1,0,1" or "1,0,1,0,1,0", the paired differences between groups will be calculated as "group 2 - group 1". See references for more detail. Test statistics are determined by the values of test
:
f.block
in requiring multiple observations per group*block combintation. This test uses the means of each group*block combination as response variable and test for group main effects assuming a randomized block design;Z.test
in linear models, each with a row of X as outcome, possibly adjusted by covariates Z.incl
from the matrix Z
(in the case of no covariates, one recovers the one-sample t-statistic, t.onesamp
);Z.incl
from the matrix Z
;Z.incl
from the matrix Z
.
When robust=TRUE
, non-parametric versions of each test are performed. For the linear models, this means rlm
is used instead of lm
. There is not currently a robust version of test=coxph.YvsXZ
. For the t- and F-tests, data values are simply replaced by their ranks. This is equivalent to performing the following familiar named rank-based tests. The conversion after each test is the formula to convert from the MTP test to the statistic reported by the listed R function (where num is the numerator of the MTP test statistics, n is total sample size, nk is group k sample size, K is total number of groups or treatments, and rk are the ranks in group k).
wilcox.test
with y=NULL
or paired=TRUE
,
conversion: num/nwilcox.test
,
conversion: n2*(num+mean(r1)) - n2*(n2+1)/2kruskal.test
,
conversion: num*12/(n*(n-1))friedman.test
,
conversion: num*12/(K*(K+1))friedman.test
,
conversion: num*12/(K*(K+1))The implemented MTPs are based on control of the family-wise error rate, defined as the probability of any false positives. Let Vn denote the (unobserved) number of false positives. Then, control of FWER at level alpha means that Pr(Vn>0)<=alpha. the="" set="" of="" rejected="" hypotheses="" under="" a="" fwer="" controlling="" procedure="" can="" be="" augmented="" to="" increase="" number="" rejections,="" while="" other="" error="" rates.="" generalized="" family-wise="" rate="" is="" defined="" as="" pr(vn="">k)<=alpha, and="" it="" is="" clear="" that="" one="" can="" simply="" take="" an="" fwer="" controlling="" procedure,="" reject="" k="" more="" hypotheses="" have="" control="" of="" gfwer="" at="" level="" alpha.="" the="" tail="" probability="" proportion="" false="" positives="" depends="" on="" both="" number="" postives="" (vn)="" rejections="" (rn).="" tppfp="" alpha="" means="" pr(vn="" rn="">q)<=alpha, for="" some="" proportion="" q.="" control="" of="" the="" false="" discovery="" rate="" refers="" to="" expected="" positives="" (rather="" than="" a="" tail="" probability).="" fdr="" at="" level="" alpha="" means="" e(vn="" rn)<="alpha.
In practice, one must choose a method for estimating the test statistics null distribution. We have implemented several versions of an ordinary non-parametric bootstrap estimator and a permutation estimator (which makes sense in certain settings, see references). The non-parametric bootstrap estimator (default) provides asymptotic control of the type I error rate for any data generating distribution, whereas the permutation estimator requires the subset pivotality assumption. One draw back of both methods is the discreteness of the estimated null distribution when the sample size is small. Furthermore, when the sample size is small enough, it is possible that ties will lead to a very small variance estimate. Using standardize=FALSE
allows one to avoid these unusually small test statistic denominators. Parametric bootstrap estimators are another option (not yet implemented). For asymptotically linear estimators, such as those commonly probed using t-statistics, another choice of null distribution is provided when sampling from a multivariate normal distribution with mean zero and correlation matrix derived from the vector influence function. Sampling from a multivariate normal may alleviate the discreteness of the bootstrap and permutation distributions, although accuracy in estimation of the test statistics correlation matrix will be of course also affected by sample size.
For the nonparametric bootstrap distribution with marginal null quantile transformation, the following defaults for marg.null
and marg.par
are available based on choice of test statistics, sample size 'n', and various other parameters:
The above defaults, however, can be overridden by manually setting values of marg.null
and marg.par
. In the case of nulldist='ic'
, and ic.quant.trans=TRUE
, the defaults are the same as above except that 'lm.XvsZ' and 'lm.YvsXZ' are replaced with t-distributions with df=n-p.
Given observed test statistics, a type I error rate (with nominal level), and a test statistics null distribution, MTPs provide adjusted p-values, cutoffs for test statistics, and possibly confidence regions for estimates. Four methods are implemented, based on minima of p-values and maxima of test statistics. Only the step down methods are currently available with the permutation null distribution.
Computation times using a bootstrap null distribution are slower when weights are used for one and two-sample tests. Computation times when using a bootstrap null distribution also are slower for the tests lmXvsZ
, lmYvsXZ
, coxph.YvsXZ
.
To execute the bootstrap on a computer cluster, a cluster object generated with makeCluster
in the package snow
may be used as the argument for cluster. Alternatively, the number of nodes to use in the computer cluster can be used as the argument to cluster. In this case, type
must be specified and a cluster will be created. In both cases, Biobase
and multtest
will be loaded onto each cluster node if these libraries are located in a directory in the standard search path. If these libraries are in a non-standard location, it is necessary to first create the cluster, load Biobase
and multtest
on each node and then to use the cluster object as the argument to cluster. See documentation for snow
package for additional information on creating and using a cluster.
Finally, note that the old argument csnull
is now DEPRECATED as of multtest
v. 2.0.0 given the expanded null distribution options described above. Previously, this argument was an indicator of whether the bootstrap estimated test statistics distribution should be centered and scaled (to produce a null distribution) or not. If csnull=FALSE
, the (raw) non-null bootstrap estimated test statistics distribution was returned. If the non-null bootstrap distribution should be returned, this object is now stored in the 'rawdist' slot when keep.rawdist=TRUE
in the original MTP
function call.
M.J. van der Laan, S. Dudoit, K.S. Pollard (2004), Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate, Statistical Applications in Genetics and Molecular Biology, 3(1). http://www.bepress.com/sagmb/vol3/iss1/art14/
S. Dudoit, M.J. van der Laan, K.S. Pollard (2004), Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates, Statistical Applications in Genetics and Molecular Biology, 3(1). http://www.bepress.com/sagmb/vol3/iss1/art13/
K.S. Pollard and Mark J. van der Laan, "Resampling-based Multiple Testing: Asymptotic Control of Type I Error and Applications to Gene Expression Data" (June 24, 2003). U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper 121. http://www.bepress.com/ucbbiostat/paper121
M.J. van der Laan and A.E. Hubbard (2006), Quantile-function Based Null Distributions in Resampling Based Multiple Testing, Statistical Applications in Genetics and Molecular Biology, 5(1). http://www.bepress.com/sagmb/vol5/iss1/art14/
S. Dudoit and M.J. van der Laan. Multiple Testing Procedures and Applications to Genomics. Springer Series in Statistics. Springer, New York, 2008.
EBMTP
, MTP-class
, MTP-methods
, mt.minP
, mt.maxT
, ss.maxT
, fwer2gfwer
#data
set.seed(99)
data<-matrix(rnorm(90),nr=9)
group<-c(rep(1,5),rep(0,5))
#fwer control with centered and scaled bootstrap null distribution
#(B=100 for speed)
m1<-MTP(X=data,Y=group,alternative="less",B=100,method="sd.minP")
print(m1)
summary(m1)
par(mfrow=c(2,2))
plot(m1,top=9)
#fwer control with quantile transformed bootstrap null distribution
#default settings = N(0,1) marginal null distribution
m2<-MTP(X=data,Y=group,alternative="less",B=100,method="sd.minP",
nulldist="boot.qt",keep.rawdist=TRUE)
#fwer control with quantile transformed bootstrap null distribution
#marginal null distribution and df parameters manually set,
#first all equal, then varying with hypothesis
m3<-update(m2,marg.null="t",marg.par=10)
mps<-matrix(c(rep(9,5),rep(10,5)),nr=10,nc=1)
m4<-update(m2,marg.null="t",marg.par=mps)
m1@nulldist.type
m2@nulldist.type
m2@marg.null
m2@marg.par
m3@nulldist.type
m3@marg.null
m3@marg.par
m4@nulldist.type
m4@marg.null
m4@marg.par
Run the code above in your browser using DataLab