Learn R Programming

edgeR (version 3.14.0)

dispBinTrend: Estimate Dispersion Trend by Binning for NB GLMs

Description

Estimate the abundance-dispersion trend by computing the common dispersion for bins of genes of similar AveLogCPM and then fitting a smooth curve.

Usage

dispBinTrend(y, design=NULL, offset=NULL, df = 5, span=0.3, min.n=400, method.bin="CoxReid", method.trend="spline", AveLogCPM=NULL, weights=NULL, ...)

Arguments

y
numeric matrix of counts
design
numeric matrix giving the design matrix for the GLM that is to be fit.
offset
numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In adjustedProfileLik the offset must be a matrix with the same dimension as the table of counts.
df
degrees of freedom for spline curve.
span
span used for loess curve.
min.n
minimim number of genes in a bins.
method.bin
method used to estimate the dispersion in each bin. Possible values are "CoxReid", "Pearson" or "deviance".
method.trend
type of curve to smooth the bins. Possible values are "spline" for a natural cubic regression spline or "loess" for a linear lowess curve.
AveLogCPM
numeric vector giving average log2 counts per million for each gene
weights
optional numeric matrix giving observation weights
...
other arguments are passed to estimateGLMCommonDisp

Value

list with the following components:
AveLogCPM
numeric vector containing the overall AveLogCPM for each gene
dispersion
numeric vector giving the trended dispersion estimate for each gene
bin.AveLogCPM
numeric vector of length equal to nbins giving the average (mean) AveLogCPM for each bin
bin.dispersion
numeric vector of length equal to nbins giving the estimated common dispersion for each bin

Details

Estimate a dispersion parameter for each of many negative binomial generalized linear models by computing the common dispersion for genes sorted into bins based on overall AveLogCPM. A regression natural cubic splines or a linear loess curve is used to smooth the trend and extrapolate a value to each gene.

If there are fewer than min.n rows of y with at least one positive count, then one bin is used. The number of bins is limited to 1000.

References

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. http://nar.oxfordjournals.org/content/40/10/4288

See Also

estimateGLMTrendedDisp

Examples

Run this code
ngenes <- 1000
nlibs <- 4
means <- seq(5,10000,length.out=ngenes)
y <- matrix(rnbinom(ngenes*nlibs,mu=rep(means,nlibs),size=0.1*means),nrow=ngenes,ncol=nlibs)
keep <- rowSums(y) > 0
y <- y[keep,]
group <- factor(c(1,1,2,2))
design <- model.matrix(~group) # Define the design matrix for the full model
out <- dispBinTrend(y, design, min.n=100, span=0.3)
with(out, plot(AveLogCPM, sqrt(dispersion)))

Run the code above in your browser using DataLab