dglmStdResid: Visualize the mean-variance relationship in DGE data using standardized residuals

Description

Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. However, the standard approach to visualizing the mean-variance relationship is not appropriate for general, complicated experimental designs that require generalized linear models (GLMs) for analysis. Here are functions to compute standardized residuals from a Poisson GLM and plot them for bins based on overall expression level of genes as a way to visualize the mean-variance relationship. A rough estimate of the dispersion parameter can also be obtained from the standardized residuals.

Usage

dglmStdResid(y, design, dispersion=0, offset=0, nbins=100, make.plot=TRUE,
          xlab="Mean", ylab="Ave. binned standardized residual", ...)
getDispersions(binned.object)

Arguments

numeric matrix of counts, each row represents one genes, each column represents one DGE library.

design

numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.

dispersion

numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.

offset

numeric vector or matrix giving the offset that is to be included in teh log-linear model predictor. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.

nbins

scalar giving the number of bins (formed by using the quantiles of the genewise mean expression levels) for which to compute average means and variances for exploring the mean-variance relationship. Default is 100 bins

make.plot

logical, whether or not to plot the mean standardized residual for binned data (binned on expression level). Provides a visualization of the mean-variance relationship. Default is TRUE.

xlab

character string giving the label for the x-axis. Standard graphical parameter. If left as the default, then the x-axis label will be set to "Mean".

ylab

character string giving the label for the y-axis. Standard graphical parameter. If left as the default, then the y-axis label will be set to "Ave. binned standardized residual".

...

further arguments passed on to plot

binned.object

list object, which is the output of dglmStdResid.

Value

dglmStdResid produces a mean-variance plot based on standardized residuals from a Poisson model fit for each gene for the DGE data. dglmStdResid returns a list with the following elements:
ave.meansvector of the average expression level within each bin of observations
ave.std.residvector of the average standardized Poisson residual within each bin of genes
bin.meanslist containing the average (mean) expression level (given by the fitted value from the given Poisson model) for observations divided into bins based on amount of expression
bin.std.residlist containing the standardized residual from the given Poisson model for observations divided into bins based on amount of expression
meansvector giving the fitted value for each observed count
standardized.residualsvector giving approximate standardized residual for each observed count
binslist containing the indices for the observations, assigning them to bins
nbinsscalar giving the number of bins used to split up the observed counts
ngenesscalar giving the number of genes in the dataset
nlibsscalar giving the number of libraries in the dataset
getDispersions computes the dispersion from the standardized residuals and returns a list with the following components:
bin.dispersionvector giving the estimated dispersion value for each bin of observed counts, computed using the average standardized residual for the bin
bin.dispersion.usedvector giving the actual estimated dispersion value to be used. Some computed dispersions using the method in this function can be negative, which is not allowed. We use the dispersion value from the nearest bin of higher expression level with positive dispersion value in place of any negative dispersions.
dispersionvector giving the estimated dispersion for each observation, using the binned dispersion estimates from above, so that all of the observations in a given bin get the same dispersion value.

Details

This function is useful for exploring the mean-variance relationship in the data. Raw or pooled variances cannot be used for complex experimental designs, so instead we can fit a Poisson model using the appropriate design matrix to each gene and use the standardized residuals in place of the pooled variance (as in plotMeanVar) to visualize the mean-variance relationship in the data. The function will plot the average standardized residual for observations split into nbins bins by overall expression level. This provides a useful summary of how the variance of the counts change with respect to average expression level (abundance). A line showing the Poisson mean-variance relationship (mean equals variance) is always shown to illustrate how the genewise variances may differ from a Poisson mean-variance relationship. A log-log scale is used for the plot.

The function mglmLS is used to fit the Poisson models to the data. This code is fast for fitting models, but does not compute the value for the leverage, technically required to compute the standardized residuals. Here, we approximate the standardized residuals by replacing the usual denominator of ( 1 - leverage ) by ( 1 - p/n ), where n is the number of observations per gene (i.e. number of libraries) and p is the number of parameters in the model (i.e. number of columns in the full-rank design matrix.

Examples

Run this code

y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4)
design <- model.matrix(~c(0,0,1,1)+c(0,1,0,1))
binned <- dglmStdResid(y, design, dispersion=0.5)

getDispersions(binned)$bin.dispersion.used # Look at the estimated dispersions for the bins

Run the code above in your browser using DataLab