Learn R Programming

sva (version 3.20.0)

sva: sva: a package for removing artifacts from microarray and sequencing data

Description

sva has functionality to estimate and remove artifacts from high dimensional data the sva function can be used to estimate artifacts from microarray data the svaseq function can be used to estimate artifacts from count-based RNA-sequencing (and other sequencing) data. The ComBat function can be used to remove known batch effecs from microarray data. The fsva function can be used to remove batch effects for prediction problems.

This function is the implementation of the iteratively re-weighted least squares approach for estimating surrogate variables. As a by product, this function produces estimates of the probability of being an empirical control. See the function empirical.controls for a direct estimate of the empirical controls.

Usage

sva(dat, mod, mod0 = NULL, n.sv = NULL, controls = NULL, method = c("irw", "two-step", "supervised"), vfilter = NULL, B = 5, numSVmethod = "be")

Arguments

dat
The transformed data matrix with the variables in rows and samples in columns
mod
The model matrix being used to fit the data
mod0
The null model being compared when fitting the data
n.sv
The number of surogate variables to estimate
controls
A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control.
method
For empirical estimation of control probes use "irw". If control probes are known use "supervised"
vfilter
You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised"
B
The number of iterations of the irwsva algorithm to perform
numSVmethod
If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert.

Value

sv The estimated surrogate variables, one in each columnpprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneitypprob.b A vector of the posterior probabilities each gene is affected by modn.sv The number of significant surrogate variables

Details

A vignette is available by typing browseVignettes("sva") in the R prompt.

References

For the package: Leek JT, Johnson WE, Parker HS, Jaffe AE, and Storey JD. (2012) The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics DOI:10.1093/bioinformatics/bts034

For sva: Leek JT and Storey JD. (2008) A general framework for multiple testing dependence. Proceedings of the National Academy of Sciences , 105: 18718-18723.

For sva: Leek JT and Storey JD. (2007) Capturing heterogeneity in gene expression studies by `Surrogate Variable Analysis'. PLoS Genetics, 3: e161.

For Combat: Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8 (1), 118-127

For svaseq: Leek JT (2014) svaseq: removing batch and other artifacts from count-based sequencing data. bioRxiv doi: TBD

For fsva: Parker HS, Bravo HC, Leek JT (2013) Removing batch effects for prediction problems with frozen surrogate variable analysis arXiv:1301.3947

For psva: Parker HS, Leek JT, Favorov AV, Considine M, Xia X, Chavan S, Chung CH, Fertig EJ (2014) Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction Bioinformatics doi: 10.1093/bioinformatics/btu375