varianceStabilizingTransformation
,
though rlog
is more robust in the
case when the size factors vary widely.
The transformation is useful when checking for outliers
or as input for machine learning techniques
such as clustering or linear discriminant analysis.
rlog
takes as input a DESeqDataSet
and returns a
RangedSummarizedExperiment
object.
rlog(object, blind = TRUE, intercept, betaPriorVar, fitType = "parametric")
rlogTransformation(object, blind = TRUE, intercept, betaPriorVar, fitType = "parametric")
mcols(object)$dispFit
.object
,
this parameter is passed on to estimateDispersions
(options described there).DESeqTransform
if a DESeqDataSet
was provided,
or a matrix if a count matrix was provided as input.
Note that for DESeqTransform
output, the matrix of
transformed values is stored in assay(rld)
.
To avoid returning matrices with NA values, in the case of a row
of all zeros, the rlog transformation returns zeros
(essentially adding a pseudocount of 1 only to these rows).
DESeq
, which always
occurs on the raw count data, through generalized linear modeling which
incorporates knowledge of the variance-mean dependence. The rlog transformation
and VST are offered as separate functionality which can be used for visualization,
clustering or other machine learning tasks. See the transformation section of the
vignette for more details.The transformation does not require that one has already estimated size factors and dispersions.
The regularization is on the log fold changes of the count for each sample
over an intercept, for each gene. As nearby count values for low counts genes
are almost as likely as the observed count, the rlog shrinkage is greater for low counts.
For high counts, the rlog shrinkage has a much weaker effect.
The fitted dispersions are used rather than the MAP dispersions
(so similar to the varianceStabilizingTransformation
).
The prior variance for the shrinkag of log fold changes is calculated as follows:
a matrix is constructed of the logarithm of the counts plus a pseudocount of 0.5,
the log of the row means is then subtracted, leaving an estimate of
the log fold changes per sample over the fitted value using only an intercept.
The prior variance is then calculated by matching the upper quantiles of the observed
log fold change estimates with an upper quantile of the normal distribution.
A GLM fit is then calculated using this prior. It is also possible to supply the variance of the prior.
See the vignette for an example of the use and a comparison with varianceStabilizingTransformation
.
The transformed values, rlog(K), are equal to
$rlog(K_ij) = log2(q_ij) = beta_i0 + beta_ij$,
with formula terms defined in DESeq
.
The parameters of the rlog transformation from a previous dataset
can be frozen and reapplied to new samples. See the 'Data quality assessment'
section of the vignette for strategies to see if new samples are
sufficiently similar to previous datasets.
The frozen rlog is accomplished by saving the dispersion function,
beta prior variance and the intercept from a previous dataset,
and running rlog
with 'blind' set to FALSE
(see example below).
Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 2014, 15:550. http://dx.doi.org/10.1186/s13059-014-0550-8
plotPCA
, varianceStabilizingTransformation
, normTransform
dds <- makeExampleDESeqDataSet(m=6,betaSD=1)
rld <- rlog(dds)
dists <- dist(t(assay(rld)))
plot(hclust(dists))
# run the rlog transformation on one dataset
design(dds) <- ~ 1
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
rld <- rlog(dds, blind=FALSE)
# apply the parameters to a new sample
ddsNew <- makeExampleDESeqDataSet(m=1)
mcols(ddsNew)$dispFit <- mcols(dds)$dispFit
betaPriorVar <- attr(rld,"betaPriorVar")
intercept <- mcols(rld)$rlogIntercept
rldNew <- rlog(ddsNew, blind=FALSE,
intercept=intercept,
betaPriorVar=betaPriorVar)
Run the code above in your browser using DataLab