getLibsizes: Estimates library scaling factors (library sizes) for count data.

Description

This function estimates the library scaling factors that should be used for either a 'countData', or a matrix of counts and replicate information.

Usage

getLibsizes(cD, data, replicates, subset = NULL, estimationType = c("quantile", "total", "edgeR"), quantile = 0.75, ...)

Arguments

A countData object.

data

A matrix of count values. Ignored if 'cD' is given.

replicates

A replicate structure for the data given in 'data'. Ignored if 'cD' is given.

subset

A numerical vector indicating the rows of the 'countData' object that should be used to estimate library scaling factors.

estimationType

One of 'quantile', 'total', or 'edgeR'. Partial matching is allowed. See Details.

quantile

A value between 0 and 1 indicating the level of trimming that should take place. See Details.

...

Additional parameters to be passed to the 'edgeR' calcNormFactors function.

Value

If a \link{countData} object is given, an identical object will be returned with updated library sizes. If only the data and replicate structure are given, a numerical vector of library sizes (scaling factors) for each library in the data will be returned.

Details

This function estimates the library scaling factors (surrogates for library size) in one of several ways, depending on the 'estimationType' argument. 'total' will give the library sizes by summing all counts in each sample. 'quantile' will give a library scaling factor by the method of Bullard et al (Bioinformatics 2010), summing all counts in each sample whose value below the qth quantile of non-zero counts for that sample. 'edgeR' uses the Trimmed Mean of M-vales (TMM) method of Robinson \& Oshlack (Genome Biology, 2010) via the 'edgeR' calcNormFactors function; other options are available through this function.

If a countData object 'cD' is given, the library sizes will be inferred from this. Alternatively, a matrix of count values (columns are libraries) and a replicate structure (a vector defining which samples belong to which replicate group) can be given.

Examples

Run this code

data(simData)
replicates <- c(1,1,1,1,1,2,2,2,2,2)
groups <- list(c(1,1,1,1,1,1,1,1,1,1), c(1,1,1,1,1,2,2,2,2,2))
CD <- new("countData", data = simData, replicates = replicates, groups = groups)

libsizes(CD) <- getLibsizes(CD)