Learn R Programming

scater (version 1.0.4)

normaliseExprs: Normalise expression expression levels for an SCESet object

Description

Compute normalised expression values from an SCESet object and return the object with the normalised expression values added.

Usage

normaliseExprs(object, method = "none", design = NULL, feature_set = NULL, exprs_values = "counts", return_norm_as_exprs = TRUE, ...)
normalizeExprs(...)

Arguments

object
an SCESet object.
method
character string giving method to be used to calculate normalisation factors. Passed to calcNormFactors.
design
design matrix defining the linear model to be fitted to the normalised expression values. If not NULL, then the residuals of this linear model fit are used as the normalised expression values.
feature_set
character, numeric or logical vector indicating a set of features to use for the PCA. If character, entries must all be in featureNames(object). If numeric, values are taken to be indices for features. If logical, vector is used to index features and should have length equal to nrow(object).
exprs_values
character string indicating which slot of the assayData from the SCESet object should be used as expression values. Valid options are 'counts', the count values, 'exprs' the expression slot, 'tpm' the transcripts-per-million slot or 'fpkm' the FPKM slot.
return_norm_as_exprs
logical, should the normalised expression values be returned to the exprs slot of the object? Default is TRUE. If FALSE, values in the exprs slot will be left untouched. Regardless, normalised expression values will be returned to the norm_exprs slot of the object.
...
arguments passed to normaliseExprs (in the case of normalizeExprs) or to calcNormFactors.

Value

an SCESet object

Details

This function allows the user to compute normalised expression values from an SCESet object. The 'raw' values used can be the values in the 'counts' (default), 'exprs', 'tpm' or 'fpkm' slot of the SCESet. Normalised expression values are added to the 'norm_exprs' slot of the object. Normalised expression values are on the log2-scale, with an offset defined by the logExprsOffset slot of the SCESet object. If the 'exprs_values' argument is one of 'counts', 'tpm' or 'fpkm', then a corresponding slot with normalised values is added: 'norm_counts', 'norm_tpm' or 'norm_fpkm', as appropriate. If 'exprs_values' argument is 'counts' a 'norm_cpm' slot is also added, containing normalised counts-per-million values. Normalisation is done relative to a defined feature set, if desired, which defines the 'library size' by which expression values are divided. If no feature set is defined, then all features are used. A normalisation size factor can be computed (optionally), which internally uses calcNormFactors. Thus, any of the methods available for calcNormFactors can be used: "TMM", "RLE", "upperquartile" or "none". See that function for further details. Library sizes are multiplied by size factors to obtain a "normalised library size" before normalisation.

If the user wishes to remove the effects of certain explanatory variables, then the 'design' argument can be defined. The design argument must be a valid design matrix, for example as produced by model.matrix, with the relevant variables. A linear model is then fitted using lmFit on expression values after any size-factor and library size normalisation as descrived above. The returned normalised expression values are then the residuals from the linear model fit.

After normalisation, normalised expression values can be accessed with the norm_exprs function (with corresponding accessor functions for counts, tpm, fpkm, cpm). These functions can also be used to assign normalised expression values produced with external tools to an SCESet object.

normalizeExprs is exactly the same as normaliseExprs, provided for those who prefer North American spelling.

Examples

Run this code
data("sc_example_counts")
data("sc_example_cell_info")
pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)
keep_gene <- rowSums(counts(example_sceset)) > 0
example_sceset <- example_sceset[keep_gene,]

## Apply TMM normalisation taking into account all genes
example_sceset <- normaliseExprs(example_sceset, method = "TMM")
## Scale counts relative to a set of control features (here the first 100 features)
example_sceset <- normaliseExprs(example_sceset, method = "none", 
feature_set = 1:100)

Run the code above in your browser using DataLab