Learn R Programming

siggenes (version 1.46.0)

find.a0: Computation of the Fudge Factor

Description

Suggests an optimal value for the fudge factor in an EBAM analysis as proposed by Efron et al. (2001).

Usage

find.a0(data, cl, method = z.find, B = 100, delta = 0.9, quan.a0 = (0:5)/5, include.zero = TRUE, control = find.a0Control(), gene.names = dimnames(data)[[1]], rand = NA, ...)

Arguments

data
a matrix, data frame or an ExpressionSet object. Each row of data (or exprs(data), respectively) must correspond to a variable (e.g., a gene), and each column to a sample (i.e.\ an observation).
cl
a numeric vector of length ncol(data) containing the class labels of the samples. In the two class paired case, cl can also be a matrix with ncol(data) rows and 2 columns. If data is an ExpressionSet object, cl can also be a character string naming the column of pData(data) that contains the class labels of the samples. In the one-class case, cl should be a vector of 1's. In the two class unpaired case, cl should be a vector containing 0's (specifying the samples of, e.g., the control group) and 1's (specifying, e.g., the case group). In the two class paired case, cl can be either a numeric vector or a numeric matrix. If it is a vector, then cl has to consist of the integers between -1 and $-n/2$ (e.g., before treatment group) and between 1 and $n/2$ (e.g., after treatment group), where $n$ is the length of cl and $k$ is paired with $-k$, $k=1,\dots,n/2$. If cl is a matrix, one column should contain -1's and 1's specifying, e.g., the before and the after treatment samples, respectively, and the other column should contain integer between 1 and $n/2$ specifying the $n/2$ pairs of observations. In the multiclass case and if method = cat.stat, cl should be a vector containing integers between 1 and $g$, where $g$ is the number of groups. For examples of how cl can be specified, see the manual of siggenes.
method
the name of a function for computing the numerator and the denominator of the test statistic of interest, and for specifying other objects required for the identification of the fudge factor. The default function z.find provides these objects for t- and F-statistics. It is, however, also possible to employ an user-written function. For how to write such a function, see the vignette of siggenes.
B
the number of permutations used in the estimation of the null distribution.
delta
a probability. All genes showing a posterior probability that is larger than or equal to delta are called differentially expressed.
quan.a0
a numeric vector indicating over which quantiles of the standard deviations of the genes the fudge factor $a0$ should be optimized.
include.zero
should $a0 = 0$, i.e. the not-modified test statistic also be a possible choice for the fudge factor?
control
further arguments for controlling the EBAM analysis with find.a0. For these arguments, see find.a0Control.
gene.names
a character vector of length nrow(data) containing the names of the genes. By default, the row names of data are used.
rand
integer. If specified, i.e. not NA, the random number generator will be set into a reproducible state.
...
further arguments for the function specified by fun. For further arguments of fun = z.find, see z.find.

Value

An object of class FindA0.

Details

The suggested choice for the fudge factor is the value of $a0$ that leads to the largest number of genes showing a posterior probability larger than delta. Actually, only the genes having a posterior probability larger than delta are called differentially expressed that do not exhibit a test score less extreme than the score of a gene whose posterior probability is less than delta. So, let's say, we have done an EBAM analysis with a t-test and we have ordered the genes by their t-statistic. Let's further assume that Gene 1 to Gene 5 (i.e. the five genes with the lowest t-statistics), Gene 7 and 8, Gene 3012 to 3020, and Gene 3040 to 3051 are the only genes that show a posterior probability larger than delta. Then, Gene 1 to 5, and 3040 to 3051 are called differentially expressed, but Gene 7 and 8, and 3012 to 3020 are not called differentially expressed, since Gene 6 and Gene 3021 to 3039 show a posterior probability less than delta.

References

Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.

See Also

ebam, FindA0-class, find.a0Control