Learn R Programming

sva (version 3.20.0)

svaseq: A function for estimating surrogate variables for count based RNA-seq data.

Description

This function is the implementation of the iteratively re-weighted least squares approach for estimating surrogate variables. As a by product, this function produces estimates of the probability of being an empirical control. This function first applies a moderated log transform as described in Leek 2014 before calculating the surrogate variables. See the function empirical.controls for a direct estimate of the empirical controls.

Usage

svaseq(dat, mod, mod0 = NULL, n.sv = NULL, controls = NULL, method = c("irw", "two-step", "supervised"), vfilter = NULL, B = 5, numSVmethod = "be", constant = 1)

Arguments

dat
The transformed data matrix with the variables in rows and samples in columns
mod
The model matrix being used to fit the data
mod0
The null model being compared when fitting the data
n.sv
The number of surogate variables to estimate
controls
A vector of probabilities (between 0 and 1, inclusive) that each gene is a control. A value of 1 means the gene is certainly a control and a value of 0 means the gene is certainly not a control.
method
For empirical estimation of control probes use "irw". If control probes are known use "supervised"
vfilter
You may choose to filter to the vfilter most variable rows before performing the analysis. vfilter must be NULL if method is "supervised"
B
The number of iterations of the irwsva algorithm to perform
numSVmethod
If n.sv is NULL, sva will attempt to estimate the number of needed surrogate variables. This should not be adapted by the user unless they are an expert.
constant
The function takes log(dat + constant) before performing sva. By default constant = 1, all values of dat + constant should be positive.

Value

sv The estimated surrogate variables, one in each columnpprob.gam: A vector of the posterior probabilities each gene is affected by heterogeneitypprob.b A vector of the posterior probabilities each gene is affected by modn.sv The number of significant surrogate variables