d.stat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, s.alpha = seq(0, 1, 0.05), include.zero = TRUE, n.subset = 10, mat.samp = NULL, B.more = 0.1, B.max = 30000, gene.names = NULL, R.fold = 1, use.dm = TRUE, R.unlog = TRUE, na.replace = TRUE, na.method = "mean", rand = NA)
ExpressionSet
object. Each row of
data
(or exprs(data)
, respectively) must correspond to a variable (e.g., a gene),
and each column to a sample (i.e.\ an observation).ncol(data)
containing the class
labels of the samples. In the two class paired case, cl
can also
be a matrix with ncol(data)
rows and 2 columns. If data
is
an ExpressionSet
object, cl
can also be a character string.
For details on how cl
should be specified, see ?sam
.FALSE
(default), Welch's t-statistic will be computed.
If TRUE
, the pooled variance will be used in the computation of
the t-statistic.FALSE
(default), the mean number of falsely called genes
will be computed. Otherwise, the median number is calculated.NA
(default),
s0
will be computed automatically.s0
. If
s.alpha
is a vector, the fudge factor is computed as proposed by
Tusher et al. (2001). Otherwise, the quantile of the standard deviations
specified by s.alpha
is used as fudge factor.TRUE
, s0
= 0 will also be a possible choice
for the fudge factor. Hence, the usual t-statistic or F statistic, respectively,
can also be a possible choice for the expression score $d$. If FALSE
,
s0=0
will not be a possible choice for the fudge factor. The latter
follows Tusher et al. (2001) definition of the fudge factor in which only strictly
positive values are considered.med = TRUE
, n.subset
will be set to 1.ncol(data)
columns except for the two class
paired case in which mat.samp
has ncol(data)
/2 columns.
Each row specifies one permutation of the group labels used in the computation
of the expected expression scores $d.bar$. If not specified
(mat.samp=NULL
), a matrix having B
rows and ncol(data)
is
generated automatically and used in the computation of $d.bar$. In
the two class unpaired case and the multiclass case, each row of mat.samp
must contain the same group labels as cl
. In the one class and the two
class paired case, each row must contain -1's and 1's. In the one class case,
the expression values are multiplied by these -1's and 1's. In the two class paired
case, each column corresponds to one observation pair whose difference is multiplied
by either -1 or 1. For more details and examples, see the manual of siggenes.B.more
)*B
, full permutation will be done.
Otherwise, B
permutations are used. This avoids that B
permutations
will be used -- and not all permutations -- if the number of all possible permutations
is just a little larger than B
.nrow(data)
containing the
names of the genes.B.max
, B
randomly selected permutations will be used
in the computation of the null distribution. Otherwise, B
random draws
of the group labels are used. In the latter way of permuting it is possible that
some of the permutations are used more than once.R.fold
, or larger than or equal to 1/R.fold
,respectively,
then this gene will be excluded from the SAM analysis. The expression score
$d$ of excluded genes is set to NA
. By default, R.fold
is set to 1 such that all genes are included in the SAM analysis. Setting
R.fold
to 0 or a negative value will avoid the computation of the fold
change. The fold change is only computed in the two-class unpaired cases.TRUE
, the fold change is computed by 2 to the power of the difference between
the mean log2 intensities of the two groups, i.e.\ 2 to the power of the numerator of the test statistic.
If FALSE
, the fold change is determined
by computing 2 to the power of data
(if R.unlog = TRUE
) and then calculating the ratio of the
mean intensity in the group coded by 1 to the mean intensity in the group coded
by 0. The latter is the definition of the fold change used in Tusher et al.\ (2001).TRUE
, the anti-log of data
will be used in the computation of the
fold change. Otherwise, data
is used. This transformation should be done
when data
is log2-tranformed (in a SAM analysis it is highly recommended
to use log2-transformed expression data). Ignored if use.dm = TRUE
.TRUE
, missing values will be removed by the genewise/rowwise
statistic specified by na.method
. If a gene has less than 2 non-missing
values, this gene will be excluded from further analysis. If na.replace=FALSE
,
all genes with one or more missing values will be excluded from further analysis.
The expression score $d$ of excluded genes is set to NA
.na.replace=TRUE
. Must be either "mean"
(default)
or median
.NA
, the random number generator
will be set into a reproducible state.Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.
SAM-class
,sam
, z.ebam