Learn R Programming

edge (version 2.4.2)

estimateDisp: Estimate Common, Trended and Tagwise Negative Binomial dispersions by weighted likelihood empirical Bayes

Description

Maximizes the negative binomial likelihood to give the estimate of the common, trended and tagwise dispersions across all tags.

Usage

## S3 method for class 'DGEList':
estimateDisp(y, design=NULL, prior.df=NULL, trend.method="locfit", mixed.df=FALSE, 
            tagwise=TRUE, span=NULL, min.row.sum=5, grid.length=21, grid.range=c(-10,10), robust=FALSE, 
            winsor.tail.p=c(0.05,0.1), tol=1e-06, ...)
## S3 method for class 'default':
estimateDisp(y, design=NULL, group=NULL, lib.size=NULL, offset=NULL, prior.df=NULL,
            trend.method="locfit", mixed.df=FALSE, tagwise=TRUE, span=NULL, min.row.sum=5, grid.length=21, 
            grid.range=c(-10,10), robust=FALSE, winsor.tail.p=c(0.05,0.1), tol=1e-06, weights=NULL, ...)

Arguments

y
matrix of counts or a DGEList object.
design
numeric design matrix
prior.df
prior degrees of freedom. It is used in calculating prior.n.
trend.method
method for estimating dispersion trend. Possible values are "none", "movingave", "loess" and "locfit" (default).
mixed.df
logical, only used when trend.method="locfit". If FALSE, locfit uses a polynomial of degree 0. If TRUE, locfit uses a polynomial of degree 1 for lowly expressed genes. Care is taken to smooth the curve.
tagwise
logical, should the tagwise dispersions be estimated?
span
width of the smoothing window, as a proportion of the data set.
min.row.sum
numeric scalar giving a value for the filtering out of low abundance tags. Only tags with total sum of counts above this value are used. Low abundance tags can adversely affect the dispersion estimation, so this argument allows the user to select an appropriate filter threshold for the tag abundance.
grid.length
the number of points on which the interpolation is applied for each tag.
grid.range
the range of the grid points around the trend on a log2 scale.
robust
logical, should the estimation of prior.df be robustified against outliers?
winsor.tail.p
numeric vector of length 1 or 2, giving left and right tail proportions of the deviances to Winsorize when estimating prior.df.
tol
the desired accuracy, passed to optimize
group
vector or factor giving the experimental group/condition for each library.
lib.size
numeric vector giving the total count (sequence depth) for each library.
offset
offset matrix for the log-linear model, as for glmFit. Defaults to the log-effective library sizes.
weights
optional numeric matrix giving observation weights
...
other arguments that are not currently used.

Value

  • estimateDisp.DGEList adds the following components to the input DGEList object:
  • common.dispersionestimate of the common dispersion.
  • trended.dispersionestimates of the trended dispersions.
  • tagwise.dispersiontagwise estimates of the dispersion parameter if tagwise=TRUE.
  • AveLogCPMnumeric vector giving log2(AveCPM) for each row of y.
  • trend.methodmethod for estimating dispersion trend as given in the input.
  • prior.dfprior degrees of freedom. It is a vector when robust method is used.
  • prior.nestimate of the prior weight, i.e. the smoothing parameter that indicates the weight to put on the common likelihood compared to the individual tag's likelihood.
  • spanwidth of the smoothing window used in estimating dispersions.
  • estimateDisp.default returns a list containing common.dispersion, trended.dispersion, tagwise.dispersion (if tagwise=TRUE), span, prior.df and prior.n.

Details

This function calculates a matrix of likelihoods for each tag at a set of dispersion grid points, and then applies weighted likelihood empirical Bayes method to obtain posterior dispersion estimates. If there is no design matrix, it calculates the quantile conditional likelihood for each tag and then maximizes it. In this case, it is similar to the function estimateCommonDisp and estimateTagwiseDisp. If a design matrix is given, it calculates the adjusted profile log-likelihood for each tag and then maximizes it. In this case, it is similar to the functions estimateGLMCommonDisp, estimateGLMTrendedDisp and estimateGLMTagwiseDisp.

Note that the terms `tag' and `gene' are synonymous here.

References

Chen, Y, Lun, ATL, and Smyth, GK (2014). Differential expression analysis of complex RNA-seq experiments using edgeR. In: Statistical Analysis of Next Generation Sequence Data, Somnath Datta and Daniel S Nettleton (eds), Springer, New York. http://www.statsci.org/smyth/pubs/edgeRChapterPreprint.pdf

Phipson, B, Lee, S, Majewski, IJ, Alexander, WS, and Smyth, GK (2016). Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Annals of Applied Statistics 10. http://arxiv.org/abs/1602.08678

See Also

estimateCommonDisp, estimateTagwiseDisp, estimateGLMCommonDisp, estimateGLMTrendedDisp, estimateGLMTagwiseDisp

Examples

Run this code
# True dispersion is 1/5=0.2
y <- matrix(rnbinom(1000, mu=10, size=5), ncol=4)
group <- c(1,1,2,2)
design <- model.matrix(~group)
d <- DGEList(counts=y, group=group)
d1 <- estimateDisp(d)
d2 <- estimateDisp(d, design)

Run the code above in your browser using DataLab