Usage
safe(X.mat, y.vec, C.mat = NULL, Z.mat = NULL, method = "permutation", platform = NULL, annotate = NULL, min.size = 2, max.size = Inf, by.gene = FALSE, local = "default", global = "default", args.local = NULL, args.global = list(one.sided = FALSE), Pi.mat = NULL, error = "FDR.BH", parallel=FALSE, alpha = NA, epsilon = 10^(-10), print.it = TRUE, ...)
Arguments
X.mat
A matrix or data.frame of expression data of size m by n where each row corresponds to a gene feature and each column to a sample. Data should be properly normalized and cannot contain missing values.
y.vec
A numeric, integer or character vector of length n containing the response of interest. For examples of the acceptable forms y.vec
can take, see the vignette.
C.mat
A matrix containing the gene category assignments. Each column represents a category and should be named accordingly. For each column, values of 1 (TRUE
) and 0 (FALSE
) indicate whether the features in the corresponding rows of X.mat
are contained in the category. This can also be a list containing a sparse matrix and names as created by getCmatrix
.
Z.mat
A data.frame of size n by p, with p covariates as numeric or factors.
method
Type of hypothesis test can be specified as "permutation", "bootstrap.t", and "bootstrap.q". "express" calls the dependent package safeExpress
. See vignette for details.
platform
If C.mat
is unspecified, a character string of a Bioconductor annotation package can be used to build gene categories. See vignette for details and examples.
annotate
If C.mat
is unspecified, a character string to specify the type of gene categories to build from annotation packages. "GO.MF", "GO.BP", "GO.CC", and "GO.ALL" (default) specify one or all Gene Ontologies. "KEGG" specifies pathways, and "PFAM" homologous families from the respective sources.
min.size
Optional minimum category size in building C.mat
.
max.size
Optional maximum category size in building C.mat
.
by.gene
Logical argument (default = FALSE
) specifying whether multiple features to a single gene should be down-weighted.
local
Specifies the gene-specific statistic from the following options: "t.Student", "t.Welch", and "t.paired", for 2-sample designs, "f.ANOVA" for 1-way ANOVAs, and "t.LM" for simple linear regressions. "default" will choose between "t.Student", "f.ANOVA", and "t.LM" based on the form of y.vec
. User-defined local statistics can also be used; details are provided in the vignette.
global
Specifies the global statistic for a gene categories. By default, the Wilcoxon rank sum ("Wilcoxon") is used. Else, a Fisher's Exact test statistic ("Fisher"), a Pearson's chi-squared type statistic ("Pearson") or t-statistic for average difference ("AveDiff") is available. User-defined global statistics can also be implemented.
args.local
An optional list to be passed to user-defined local statistics that require additional arguments. By default args.local = NULL
.
args.global
An optional list to be passed to global statistics that require additional arguments. For two-sided local statistics, args.global
= list(one.sided=F) allows bi-directional differential expression to be considered.
Pi.mat
Either an integer, or a matrix or data.frame containing the permutations. See getPImatrix
for the acceptable form of a matrix or data.frame. If Pi.mat
is an integer, B, then safe
will generate B resamples of X.mat
.
error
Specifies the method for computing error rate estimates. By default, Benjamini-Hochberg step down ("FDR.BH") FDR estimates are computed. A Bonferroni ("FWER.Bonf") and Holm's step-up ("FWER.Holm") adjustment can also be specified. Under permutation, "FDR.YB" computes the Yekutieli-Benjamini FDR estimate, and "FWER.WY" computes the Westfall-Young FWER estimate. The user can also specify "none" if no error rates are desired.
parallel
Logical argument (default = FALSE
) specifying whether hypothesis test of method
should be conducted with parallel processing. Only compatible with error = "none", "FWER.Bonf",
or FDR.BH
. See vignette for details.
alpha
The threshold for significant results to return. By default, alpha will be 0.05 for nominal p-values (error
= "none" ), and 0.1 for adjusted p-values.
epsilon
Numeric argument sets the minimum difference for ranking local and global statistics, correcting a numerical precision issue when computing empirical p-values in small data sets (n < 15). The default value is 10^(-10).
print.it
Logical argument (default = TRUE
) specifying whether to print progress updates to the log for permutation and bootstrap calculations.
...
Allows arguments from version 2.0 to be ignored.