This function subsamples the columns (arrays) of a microarray data set and do two-sample t-tests. Subsamples from each treatment group are obtained and combined. A t-test is conducted for each row (gene) of the subsampled data set and the p-value density at one is estimated for each combined subsample.
subt(dat, n1 = round(ncol(dat)/2), n2 = ncol(dat) - n1,
f1method = c("lastbin", "qvalue"),
max.reps = if(balanced)20 else 5, balanced = FALSE, ...)
a numeric matrix, the microarray data set with each row being a gene, and each column being a
subject. The first n1
columns correspond to treatment group 1 and the rest n2
columns correspond to treatment group 2.
a positive integer, the original sample size in treatment group 1.
a positive integer, the original sample size in treatment group 2.
character, the name of the function to be used to estimate the p-value density at 1. The first argument of the function needs to be a vector of values.
a positive integer, the maximum number of subsamples to obtain per subsample size
configuration. If this is set to Inf
, then all possible subsamples will be tried.
However, see Notes and the R
argument of combn2R
.
logical, indicating whether only balanced subsamples are obtained. This is computationally faster and is good for initial exploration purposes.
additional arguments used by f1method
.
an object of class c("subt","matrix")
, which is a G-by-3 numeric matrix, where G is nrow{dat}
,
with column names 'f1', 'n1', and 'n2', corresponding to the p-value density at 1 and subsample size
in each treatment group. This object also has the following attributes
,
the same as the argument n1
.
the same as the argument n2
.
the same as the argument f1method
.
the same as the argument max.reps
.
the same as the argument balanced
.
This function tries to get possible subsamples through combn2R
.
For each total subsample size M=3,4,...,N, where N=n1+n2, do the following,
1For each treatment 1 subsample size m1=1,2,...,n1, let m2=M-m1. If 1<=m2<=n2 and at least one of balanced
and m1=m2 is true, then do the following,
1.1Randomly choose max.reps
subsamples among all possible subsamples by choosing m1 subjects from treatment group 1 and m2 subjects from treatment group 2, by using the function combn2R
with sample.method="diff2"
and try.rest=TURE
. Note that this may not be always possible due to some pratical computational limitations. See combn2R
for details.
1.2For each subsample obtained in 1.1
, (1) do a t-test for each gene (i.e., each row of the subsample), and (2) estimate the p-value density at one.
Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.
print.subt
, plot.subt
, extrp.pi0
,
matrix.t.test
,combn2R
, subex
, lastbin
,
qvalue
# NOT RUN {
set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)
## this is how the 'simulatedSubt' object in this package generated
simulatedSubt=subt(simulatedDat,balanced=FALSE,max.reps=Inf)
# }
# NOT RUN {
data(simulatedSubt)
print(simulatedSubt)
# }
Run the code above in your browser using DataLab