subt: Subsampling a Microarray Data Set for Estimating Proportion of True Null Hypotheses

Description

This function subsamples the columns (arrays) of a microarray data set and do two-sample t-tests. Subsamples from each treatment group are obtained and combined. A t-test is conducted for each row (gene) of the subsampled data set and the p-value density at one is estimated for each combined subsample.

Usage

subt(dat, n1 = round(ncol(dat)/2), n2 = ncol(dat) - n1, 
      f1method = c("lastbin", "qvalue"), 
        max.reps = if(balanced)20 else 5, balanced = FALSE,  ...)

Arguments

dat

a numeric matrix, the microarray data set with each row being a gene, and each column being a subject. The first n1 columns correspond to treatment group 1 and the rest n2 columns correspond to treatment group 2.

a positive integer, the original sample size in treatment group 1.

a positive integer, the original sample size in treatment group 2.

f1method

character, the name of the function to be used to estimate the p-value density at 1. The first argument of the function needs to be a vector of values.

max.reps

a positive integer, the maximum number of subsamples to obtain per subsample size configuration. If this is set to Inf, then all possible subsamples will be tried. However, see Notes and the R argument of combn2R.

balanced

logical, indicating whether only balanced subsamples are obtained. This is computationally faster and is good for initial exploration purposes.

…

additional arguments used by f1method.

Value

an object of class c("subt","matrix"), which is a G-by-3 numeric matrix, where G is nrow{dat}, with column names 'f1', 'n1', and 'n2', corresponding to the p-value density at 1 and subsample size in each treatment group. This object also has the following attributes,

the same as the argument n1.

the same as the argument n2.

f1method

the same as the argument f1method.

max.reps

the same as the argument max.reps.

balanced

the same as the argument balanced.

Details

This function tries to get possible subsamples through combn2R. For each total subsample size M=3,4,...,N, where N=n1+n2, do the following,

1For each treatment 1 subsample size m1=1,2,...,n1, let m2=M-m1. If 1<=m2<=n2 and at least one of balanced and m1=m2 is true, then do the following,
- 1.1Randomly choose max.reps subsamples among all possible subsamples by choosing m1 subjects from treatment group 1 and m2 subjects from treatment group 2, by using the function combn2R with sample.method="diff2" and try.rest=TURE. Note that this may not be always possible due to some pratical computational limitations. See combn2R for details.
- 1.2For each subsample obtained in 1.1, (1) do a t-test for each gene (i.e., each row of the subsample), and (2) estimate the p-value density at one.

References

Qu, L., Nettleton, D., Dekkers, J.C.M. Subsampling Based Bias Reduction in Estimating the Proportion of Differentially Expressed Genes from Microarray Data. Unpublished manuscript.

Examples

Run this code

# NOT RUN {
set.seed(9992722)
## this is how the 'simulatedDat' data set in this package generated
simulatedDat=sim.dat(G=5000)        
## this is how the 'simulatedSubt' object in this package generated
simulatedSubt=subt(simulatedDat,balanced=FALSE,max.reps=Inf) 
# }
# NOT RUN {
data(simulatedSubt)
print(simulatedSubt)
# }

Run the code above in your browser using DataLab