This function subsamples the columns (arrays) of a microarray data set and do two-sample t-tests. Subsamples from each treatment group are obtained and combined. A t-test is conducted for each row (gene) of the subsampled data set and the p-value density at one is estimated for each combined subsample.
subt(dat, n1 = round(ncol(dat)/2), n2 = ncol(dat) - n1,
f1method = c("lastbin", "qvalue"),
max.reps = if(balanced)20 else 5, balanced = FALSE, ...)a numeric matrix, the microarray data set with each row being a gene, and each column being a
subject. The first n1 columns correspond to treatment group 1 and the rest n2
columns correspond to treatment group 2.
a positive integer, the original sample size in treatment group 1.
a positive integer, the original sample size in treatment group 2.
character, the name of the function to be used to estimate the p-value density at 1. The first argument of the function needs to be a vector of values.
a positive integer, the maximum number of subsamples to obtain per subsample size
configuration. If this is set to Inf, then all possible subsamples will be tried.
However, see Notes and the R argument of combn2R.
logical, indicating whether only balanced subsamples are obtained. This is computationally faster and is good for initial exploration purposes.
additional arguments used by f1method.
an object of class c("subt","matrix"), which is a G-by-3 numeric matrix, where G is nrow{dat},
with column names 'f1', 'n1', and 'n2', corresponding to the p-value density at 1 and subsample size
in each treatment group. This object also has the following attributes,
the same as the argument n1.
the same as the argument n2.
the same as the argument f1method.
the same as the argument max.reps.
the same as the argument balanced.
This function tries to get possible subsamples through combn2R.
For each total subsample size M=3,4,...,N, where N=n1+n2, do the following,
1For each treatment 1 subsample size m1=1,2,...,n1, let m2=M-m1. If 1<=m2<=n2 and at least one of balanced and m1=m2 is true, then do the following,
1.1Randomly choose max.reps subsamples among all possible subsamples by choosing m1 subjects from treatment group 1 and m2 subjects from treatment group 2, by using the function combn2R with sample.method="diff2" and try.rest=TURE. Note that this may not be always possible due to some pratical computational limitations. See combn2R for details.
1.2For each subsample obtained in 1.1, (1) do a t-test for each gene (i.e., each row of the subsample), and (2) estimate the p-value density at one.
Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.
print.subt, plot.subt, extrp.pi0,
matrix.t.test,combn2R, subex, lastbin,
qvalue
# NOT RUN {
set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)
## this is how the 'simulatedSubt' object in this package generated
simulatedSubt=subt(simulatedDat,balanced=FALSE,max.reps=Inf)
# }
# NOT RUN {
data(simulatedSubt)
print(simulatedSubt)
# }
Run the code above in your browser using DataLab