dispBinTrend: Estimate Dispersion Trend by Binning for NB GLMs

Description

Estimate the abundance-dispersion trend by computing the common dispersion for bins of genes of similar AveLogCPM and then fitting a smooth curve.

Usage

dispBinTrend(y, design=NULL, offset=NULL, df = 5, span=0.3, min.n=400, method.bin="CoxReid", method.trend="spline", AveLogCPM=NULL, weights=NULL, ...)

Arguments

numeric matrix of counts

design

numeric matrix giving the design matrix for the GLM that is to be fit.

offset

numeric scalar, vector or matrix giving the offset (in addition to the log of the effective library size) that is to be included in the NB GLM for the genes. If a scalar, then this value will be used as an offset for all genes and libraries. If a vector, it should be have length equal to the number of libraries, and the same vector of offsets will be used for each gene. If a matrix, then each library for each gene can have a unique offset, if desired. In adjustedProfileLik the offset must be a matrix with the same dimension as the table of counts.

degrees of freedom for spline curve.

span

span used for loess curve.

min.n

minimim number of genes in a bins.

method.bin

method used to estimate the dispersion in each bin. Possible values are "CoxReid", "Pearson" or "deviance".

method.trend

type of curve to smooth the bins. Possible values are "spline" for a natural cubic regression spline or "loess" for a linear lowess curve.

AveLogCPM

numeric vector giving average log2 counts per million for each gene

weights

optional numeric matrix giving observation weights

...

other arguments are passed to estimateGLMCommonDisp

Value

AveLogCPM: numeric vector containing the overall AveLogCPM for each gene
dispersion: numeric vector giving the trended dispersion estimate for each gene
bin.AveLogCPM: numeric vector of length equal to nbins giving the average (mean) AveLogCPM for each bin
bin.dispersion: numeric vector of length equal to nbins giving the estimated common dispersion for each bin

Details

Estimate a dispersion parameter for each of many negative binomial generalized linear models by computing the common dispersion for genes sorted into bins based on overall AveLogCPM. A regression natural cubic splines or a linear loess curve is used to smooth the trend and extrapolate a value to each gene.

If there are fewer than min.n rows of y with at least one positive count, then one bin is used. The number of bins is limited to 1000.

References

McCarthy, DJ, Chen, Y, Smyth, GK (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40, 4288-4297. http://nar.oxfordjournals.org/content/40/10/4288

Examples

Run this code

ngenes <- 1000
nlibs <- 4
means <- seq(5,10000,length.out=ngenes)
y <- matrix(rnbinom(ngenes*nlibs,mu=rep(means,nlibs),size=0.1*means),nrow=ngenes,ncol=nlibs)
keep <- rowSums(y) > 0
y <- y[keep,]
group <- factor(c(1,1,2,2))
design <- model.matrix(~group) # Define the design matrix for the full model
out <- dispBinTrend(y, design, min.n=100, span=0.3)
with(out, plot(AveLogCPM, sqrt(dispersion)))

Run the code above in your browser using DataLab