"roast"(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL, set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL, nrot = 999, approx.zscore = TRUE, ...)
"mroast"(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL, set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL, nrot = 999, approx.zscore = TRUE, adjust.method = "BH", midp = TRUE, sort = "directional", ...)
"fry"(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL, standardize = "posterior.sd", sort = "directional", ...)
ExpressionSet
, MAList
, EList
or PLMSet
objects.
Rows correspond to probes and columns to samples.
If either var.prior
or df.prior
are NULL
, then y
should contain values for all genes on the arrays. If both prior parameters are given, then only y
values for the test set are required.y
are in the test set.
Can be a vector of integer indices, or a logical vector of length nrow(y)
, or a vector of gene IDs corresponding to entries in geneid
.
Alternatively it can be a data.frame with the first column containing the index vector and the second column containing gene weights.
For mroast
or fry
, index
is a list of index vectors or a list of data.frames. design
, or the name of a column of design
, or a numeric contrast vector of length equal to the number of columns of design
.y
.
Can be either a vector of length nrow(y)
or the name of the column of y$genes
containing the gene identifiers.
Defaults to rownames(y)
."mean"
,"floormean"
,"mean50"
or "msq"
.mroast
or fry
, this vector must have length equal to nrow(y)
.
For roast
, can be of length nrow(y)
or of length equal to the number of genes in the test set.squeezeVar
.squeezeVar
.TRUE
then a fast approximation is used to convert t-statistics into z-scores prior to computing set statistics. If FALSE
, z-scores will be exact.p.adjust
for possible values."directional"
), non-directional p-value ("mixed"
), or not at all ("none"
)."residual.sd"
, "posterior.sd"
or "none"
.roast
produces an object of class "Roast"
.
This consists of a list with the following components:
Active.Prop
and P.Value
, giving the proportion of genes in the set contributing materially to significance and estimated p-values, respectively.
Rows correspond to the alternative hypotheses Down, Up, UpOrDown (two-sided) and Mixed.mroast
produces a data.frame with a row for each set and the following columns:
z < -sqrt(2)
z > sqrt(2)
"Up"
or "Down"
fry
produces the same output format as mroast
but without the columns PropDown
and ProbUp
.
camera
.
For a gene set enrichment analysis style analysis using a database of gene sets, see romer
.roast
and mroast
test whether any of the genes in the set are differentially expressed.
They can be used for any microarray experiment which can be represented by a linear model.
The design matrix for the experiment is specified as for the lmFit
function, and the contrast of interest is specified as for the contrasts.fit
function.
This allows users to focus on differential expression for any coefficient or contrast in a linear model.
If contrast
is not specified, then the last coefficient in the linear model will be tested.
The argument index
is often made using ids2indices.
The argument gene.weights
allows directional weights to be set for individual genes in the set.
This is often useful, because it allows each gene to be flagged as to its direction and magnitude of change based on prior experimentation.
A typical use is to make the gene.weights
1
or -1
depending on whether the gene is up or down-regulated in the pathway under consideration.
The arguments array.weights
, block
and correlation
have the same meaning as for the lmFit
function.
The arguments df.prior
and var.prior
have the same meaning as in the output of the eBayes
function.
If these arguments are not supplied, they are estimated exactly as is done by eBayes
.
The gene set statistics "mean"
, "floormean"
, "mean50"
and msq
are defined by Wu et al (2010).
The different gene set statistics have different sensitivities to small number of genes.
If set.statistic="mean"
then the set will be statistically significantly only when the majority of the genes are differentially expressed.
"floormean"
and "mean50"
will detect as few as 25% differentially expressed.
"msq"
is sensitive to even smaller proportions of differentially expressed genes, if the effects are reasonably large.
The output gives p-values three possible alternative hypotheses,
"Up"
to test whether the genes in the set tend to be up-regulated, with positive t-statistics,
"Down"
to test whether the genes in the set tend to be down-regulated, with negative t-statistics,
and "Mixed"
to test whether the genes in the set tend to be differentially expressed, without regard for direction.
roast
estimates p-values by simulation, specifically by random rotations of the orthogonalized residuals (Langsrud, 2005), so p-values will vary slightly from run to run.
To get more precise p-values, increase the number of rotations nrot
.
The p-value is computed as (b+1)/(nrot+1)
where b
is the number of rotations giving a more extreme statistic than that observed (Phipson and Smyth, 2010).
This means that the smallest possible p-value is 1/(nrot+1)
.
mroast
does roast tests for multiple sets, including adjustment for multiple testing.
By default, mroast
reports ordinary p-values but uses mid-p-values (Routledge, 1994) at the multiple testing stage.
Mid-p-values are probably a good choice when using false discovery rates (adjust.method="BH"
) but not when controlling the family-wise type I error rate (adjust.method="holm"
).
fry
is a fast approximation to mroast
.
In the special case that df.prior
is large and set.statistic="mean"
, fry
gives the same result as mroast
with an infinite number of rotations.
In other circumstances, when genes have different variances, fry
uses a standardization strategy to approximate the mroast
results.
Using fry
may be advisable when performing tests for a large number of sets, because it is fast and because the fry
p-values are not limited by the number of rotations performed.
Langsrud, O (2005). Rotation tests. Statistics and Computing 15, 53-60.
Phipson B, and Smyth GK (2010). Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology, Volume 9, Article 39. http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf
Routledge, RD (1994). Practicing safe statistics with the mid-p. Canadian Journal of Statistics 22, 103-110.
Wu, D, Lim, E, Francois Vaillant, F, Asselin-Labat, M-L, Visvader, JE, and Smyth, GK (2010). ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26, 2176-2182. http://bioinformatics.oxfordjournals.org/content/26/17/2176
y <- matrix(rnorm(100*4),100,4)
design <- cbind(Intercept=1,Group=c(0,0,1,1))
# First set of 5 genes contains 3 that are genuinely differentially expressed
index1 <- 1:5
y[index1,3:4] <- y[index1,3:4]+3
# Second set of 5 genes contains none that are DE
index2 <- 6:10
roast(y,index1,design,contrast=2)
fry(y,list(set1=index1,set2=index2),design,contrast=2)
Run the code above in your browser using DataLab