Learn R Programming

edge (version 2.4.2)

dglmStdResid: Visualize the mean-variance relationship in DGE data using standardized residuals

Description

Appropriate modelling of the mean-variance relationship in DGE data is important for making inferences about differential expression. However, the standard approach to visualizing the mean-variance relationship is not appropriate for general, complicated experimental designs that require generalized linear models (GLMs) for analysis. Here are functions to compute standardized residuals from a Poisson GLM and plot them for bins based on overall expression level of genes as a way to visualize the mean-variance relationship. A rough estimate of the dispersion parameter can also be obtained from the standardized residuals.

Usage

dglmStdResid(y, design, dispersion=0, offset=0, nbins=100, make.plot=TRUE,
          xlab="Mean", ylab="Ave. binned standardized residual", ...)
getDispersions(binned.object)

Arguments

y
numeric matrix of counts, each row represents one genes, each column represents one DGE library.
design
numeric matrix giving the design matrix of the GLM. Assumed to be full column rank.
dispersion
numeric scalar or vector giving the dispersion parameter for each GLM. Can be a scalar giving one value for all genes, or a vector of length equal to the number of genes giving genewise dispersions.
offset
numeric vector or matrix giving the offset that is to be included in teh log-linear model predictor. Can be a vector of length equal to the number of libraries, or a matrix of the same size as y.
nbins
scalar giving the number of bins (formed by using the quantiles of the genewise mean expression levels) for which to compute average means and variances for exploring the mean-variance relationship. Default is 100 bins
make.plot
logical, whether or not to plot the mean standardized residual for binned data (binned on expression level). Provides a visualization of the mean-variance relationship. Default is TRUE.
xlab
character string giving the label for the x-axis. Standard graphical parameter. If left as the default, then the x-axis label will be set to "Mean".
ylab
character string giving the label for the y-axis. Standard graphical parameter. If left as the default, then the y-axis label will be set to "Ave. binned standardized residual".
...
further arguments passed on to plot
binned.object
list object, which is the output of dglmStdResid.

Value

  • dglmStdResid produces a mean-variance plot based on standardized residuals from a Poisson model fit for each gene for the DGE data. dglmStdResid returns a list with the following elements:
  • ave.meansvector of the average expression level within each bin of observations
  • ave.std.residvector of the average standardized Poisson residual within each bin of genes
  • bin.meanslist containing the average (mean) expression level (given by the fitted value from the given Poisson model) for observations divided into bins based on amount of expression
  • bin.std.residlist containing the standardized residual from the given Poisson model for observations divided into bins based on amount of expression
  • meansvector giving the fitted value for each observed count
  • standardized.residualsvector giving approximate standardized residual for each observed count
  • binslist containing the indices for the observations, assigning them to bins
  • nbinsscalar giving the number of bins used to split up the observed counts
  • ngenesscalar giving the number of genes in the dataset
  • nlibsscalar giving the number of libraries in the dataset
  • getDispersions computes the dispersion from the standardized residuals and returns a list with the following components:
  • bin.dispersionvector giving the estimated dispersion value for each bin of observed counts, computed using the average standardized residual for the bin
  • bin.dispersion.usedvector giving the actual estimated dispersion value to be used. Some computed dispersions using the method in this function can be negative, which is not allowed. We use the dispersion value from the nearest bin of higher expression level with positive dispersion value in place of any negative dispersions.
  • dispersionvector giving the estimated dispersion for each observation, using the binned dispersion estimates from above, so that all of the observations in a given bin get the same dispersion value.

Details

This function is useful for exploring the mean-variance relationship in the data. Raw or pooled variances cannot be used for complex experimental designs, so instead we can fit a Poisson model using the appropriate design matrix to each gene and use the standardized residuals in place of the pooled variance (as in plotMeanVar) to visualize the mean-variance relationship in the data. The function will plot the average standardized residual for observations split into nbins bins by overall expression level. This provides a useful summary of how the variance of the counts change with respect to average expression level (abundance). A line showing the Poisson mean-variance relationship (mean equals variance) is always shown to illustrate how the genewise variances may differ from a Poisson mean-variance relationship. A log-log scale is used for the plot.

The function mglmLS is used to fit the Poisson models to the data. This code is fast for fitting models, but does not compute the value for the leverage, technically required to compute the standardized residuals. Here, we approximate the standardized residuals by replacing the usual denominator of ( 1 - leverage ) by ( 1 - p/n ), where n is the number of observations per gene (i.e. number of libraries) and p is the number of parameters in the model (i.e. number of columns in the full-rank design matrix.

See Also

plotMeanVar, plotMDS.DGEList, plotSmear and maPlot provide more ways of visualizing DGE data.

Examples

Run this code
y <- matrix(rnbinom(1000,mu=10,size=2),ncol=4)
design <- model.matrix(~c(0,0,1,1)+c(0,1,0,1))
binned <- dglmStdResid(y, design, dispersion=0.5)

getDispersions(binned)$bin.dispersion.used # Look at the estimated dispersions for the bins

Run the code above in your browser using DataLab